segfaults building documentation when machine under load

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

segfaults building documentation when machine under load

Rik-4
I'm getting a vaguely repeatable situation where building the documentation
fails when the machine doing the work is under stress.

Example errors:

/bin/bash: line 1: 24234 Segmentation fault      (core dumped) /bin/bash
run-octave --norc --silent --no-history --path
/home/rik/wip/Projects_Mine/octave-dev/doc/interpreter/ --eval
"interpimages ('doc/interpreter/', 'interpft', 'txt');"
Makefile:27944: recipe for target 'doc/interpreter/interpft.txt' failed
make[2]: *** [doc/interpreter/interpft.txt] Error 139
make[2]: *** Waiting for unfinished jobs....
fatal: caught signal Segmentation fault -- stopping myself...
fatal: caught signal Segmentation fault -- stopping myself...
fatal: caught signal Segmentation fault -- stopping myself...
/bin/bash: line 1: 25316 Segmentation fault      (core dumped) /bin/bash
run-octave --norc --silent --no-history --path
/home/rik/wip/Projects_Mine/octave-dev/doc/interpreter/ --eval
"interpimages ('doc/interpreter/', 'interpderiv2', 'txt');"
Makefile:27950: recipe for target 'doc/interpreter/interpderiv2.txt' failed
make[2]: *** [doc/interpreter/interpderiv2.txt] Error 139
/bin/bash: line 1: 25338 Segmentation fault      (core dumped) /bin/bash
run-octave --norc --silent --no-history --path
/home/rik/wip/Projects_Mine/octave-dev/doc/interpreter/ --eval "plotimages
('doc/interpreter/', 'hist', 'txt');"
Makefile:27996: recipe for target 'doc/interpreter/hist.txt' failed
make[2]: *** [doc/interpreter/hist.txt] Error 139

Are other people experiencing this as well?  I think I saw something about
the Fedora buildbots also having this issue.

To be sure, I happen to have my local machine stressed.  6 of 8 cores are
pegged and then I am running 'make -j8' to do the build.  Also, uptime
reports an average load of 10.1.

A possible clue is that this usually happens when generating text files,
rather than when trying to generate actual images like png or pdf.  The
text is generated very quickly which means that race conditions might
become more apparent.  Is the script run-octave safe for parallel execution?

--Rik


Reply | Threaded
Open this post in threaded view
|

Re: segfaults building documentation when machine under load

José Abílio Matos
On Tuesday, 17 December 2019 15.55.14 WET Rik wrote:
> A possible clue is that this usually happens when generating text files,
> rather than when trying to generate actual images like png or pdf.  The
> text is generated very quickly which means that race conditions might
> become more apparent.  Is the script run-octave safe for parallel execution?
>
> --Rik

Using Fedora 31, I compile octave using on a 4 cores machine (with
hypertheading):

make -j3

things get slow mostly at some stages (when the linker is involved?) because
the memory consumption becomes high (this machine has 8 GB of RAM).

But I never noticed the crashes that you report.

Regards,
--
José Matos



Reply | Threaded
Open this post in threaded view
|

Re: segfaults building documentation when machine under load

Dmitri A. Sergatskov
In reply to this post by Rik-4
On Tue, Dec 17, 2019 at 9:56 AM Rik <[hidden email]> wrote:

>
> I'm getting a vaguely repeatable situation where building the documentation
> fails when the machine doing the work is under stress.
>
> Example errors:
>
> /bin/bash: line 1: 24234 Segmentation fault      (core dumped) /bin/bash
> run-octave --norc --silent --no-history --path
> /home/rik/wip/Projects_Mine/octave-dev/doc/interpreter/ --eval
> "interpimages ('doc/interpreter/', 'interpft', 'txt');"
> Makefile:27944: recipe for target 'doc/interpreter/interpft.txt' failed
> make[2]: *** [doc/interpreter/interpft.txt] Error 139
> make[2]: *** Waiting for unfinished jobs....
> fatal: caught signal Segmentation fault -- stopping myself...
> fatal: caught signal Segmentation fault -- stopping myself...
> fatal: caught signal Segmentation fault -- stopping myself...
> /bin/bash: line 1: 25316 Segmentation fault      (core dumped) /bin/bash
> run-octave --norc --silent --no-history --path
> /home/rik/wip/Projects_Mine/octave-dev/doc/interpreter/ --eval
> "interpimages ('doc/interpreter/', 'interpderiv2', 'txt');"
> Makefile:27950: recipe for target 'doc/interpreter/interpderiv2.txt' failed
> make[2]: *** [doc/interpreter/interpderiv2.txt] Error 139
> /bin/bash: line 1: 25338 Segmentation fault      (core dumped) /bin/bash
> run-octave --norc --silent --no-history --path
> /home/rik/wip/Projects_Mine/octave-dev/doc/interpreter/ --eval "plotimages
> ('doc/interpreter/', 'hist', 'txt');"
> Makefile:27996: recipe for target 'doc/interpreter/hist.txt' failed
> make[2]: *** [doc/interpreter/hist.txt] Error 139
>
> Are other people experiencing this as well?  I think I saw something about
> the Fedora buildbots also having this issue.
>
> To be sure, I happen to have my local machine stressed.  6 of 8 cores are
> pegged and then I am running 'make -j8' to do the build.  Also, uptime
> reports an average load of 10.1.
>
> A possible clue is that this usually happens when generating text files,
> rather than when trying to generate actual images like png or pdf.  The
> text is generated very quickly which means that race conditions might
> become more apparent.  Is the script run-octave safe for parallel execution?
>
> --Rik
>
>

yeah, we have been talking about it for years :)
Most of the Fedora buildbot failures are due to the same error. I get it on my
workstation most of the time. I get it on my laptop (with clear linux)
most of the time.
I do not think it is a load issue, it is more like race condition /
timing issue.

Dmitri.

Reply | Threaded
Open this post in threaded view
|

Re: segfaults building documentation when machine under load

Dmitri A. Sergatskov
In reply to this post by José Abílio Matos
On Tue, Dec 17, 2019 at 10:13 AM José Abílio Matos <[hidden email]> wrote:

> But I never noticed the crashes that you report.
>
> Regards,
> --
> José Matos
>

Are those incremental builds?

After you built try
rm -rf doc/ ; make -j4

Dmitri.

Reply | Threaded
Open this post in threaded view
|

Re: segfaults building documentation when machine under load

Juan Pablo Carbajal-2
In reply to this post by Rik-4
I do not ge the same error but

Makefile:27896: recipe for target 'doc/interpreter/voronoi.png' failed
make[2]: *** [doc/interpreter/voronoi.png] Error 1
make[2]: *** Waiting for unfinished jobs....
error: '__octave_link_enabled__' undefined near line 5, column 5
error: called from
    /home/juanpi/Devel/octave/build-default/libgui/graphics/PKG_ADD at
line 5 column 3
error: imwrite: invalid empty image
error: called from
    __imwrite__ at line 34 column 5
    imwrite at line 119 column 5
    print at line 748 column 13
    geometryimages at line 72 column 5
error: imwrite: invalid empty image
error: called from
    __imwrite__ at line 34 column 5
    imwrite at line 119 column 5
    print at line 748 column 13
    geometryimages at line 79 column 5

but running a couple of times solves the issue.

Reply | Threaded
Open this post in threaded view
|

Re: segfaults building documentation when machine under load

José Abílio Matos
In reply to this post by Dmitri A. Sergatskov
On Tuesday, 17 December 2019 16.25.34 WET Dmitri A. Sergatskov wrote:
> Are those incremental builds?

Yes.
 
> After you built try
> rm -rf doc/ ; make -j4
>
> Dmitri.

I tried now but it worked. :-)

But now that Juan Pablo mentioned a failure in voronoi.png I remember to got
one of those at some random compilation.

Since it succeeded the next time I ignored it and I have never reported it
because since I do incremental builds that could be a bad transient state.

So it seems that I also get problem but so few times that I forgot it. :-)
--
José Matos



Reply | Threaded
Open this post in threaded view
|

Re: segfaults building documentation when machine under load

Juan Pablo Carbajal-2
> But now that Juan Pablo mentioned a failure in voronoi.png I remember to got
> one of those at some random compilation.

The error is triggered randomly by almost all .png and only when I use
more than one job.
Doing "doc -rf doc/" before compiling doesn't seem to prevent the error.
Here is another case

  GEN      doc/interpreter/interpderiv1.png
error: imwrite: invalid empty image
error: called from
    __imwrite__ at line 34 column 5
    imwrite at line 119 column 5
    print at line 748 column 13
    geometryimages at line 99 column 5
Makefile:27906: recipe for target 'doc/interpreter/inpolygon.png' failed
make[2]: *** [doc/interpreter/inpolygon.png] Error 1
make[2]: *** Waiting for unfinished jobs....
error: imwrite: invalid empty image
error: called from
    __imwrite__ at line 34 column 5
    imwrite at line 119 column 5
    print at line 748 column 13
    interpimages at line 54 column 5
Makefile:27938: recipe for target 'doc/interpreter/interpn.png' failed
make[2]: *** [doc/interpreter/interpn.png] Error 1
make[2]: Leaving directory '/home/juanpi/Devel/octave/builds/default'
Makefile:26374: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/juanpi/Devel/octave/builds/default'
Makefile:9958: recipe for target 'all' failed
make: *** [all] Error 2

From the error message I guess this is a problem of one thread still
writing the image while another trying to use it.