buildbot test failures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

buildbot test failures

John W. Eaton
Administrator
I recently made some changes so that buildbot builds report success only
if the number of failed tests is exactly 0.  You can see here

   buildbot.octave.org:8010

that we currently aren't doing so well.

On the system that runs the following builders

   gcc-fedora
   gcc-lto-fedora
   clang-fedora

some tests fail because of a current fedora bug in osmesa.  I suppose I
could (temporarily) disable osmesa on those systems to eliminate the
failures.

On the system that runs the following builders

   gcc-6-debian
   clang-3.9-debian

8 tests are failing because there is no audio device.  I could skip
those tests if there were a way to check whether audio devices are
available, but I'm not sure how to do that.  Currently, audiodevinfo
throws an error if there is no audio device.  Would it be OK to change
that so it simply returns an empty structure?  For my purposes that
would make checking easier, but I'm not sure whether it will cause other
trouble.

The clang-osx build fails because of a problem with integer typedefs
that I reported here:

   https://savannah.gnu.org/bugs/?50510

The gcc-6-lto-debian build fails because of a mysterious problem saving
and loading signed 32-bit integers to HDF5 files.

And finally, the mxe-native-on-debian build is failing because the
of-database package configure script is failing with the following message:

   checking for PQconnectdb in -lpq... no
   configure: error: unable to find the PQconnectdb() function in pq

but I think that's just a missing package problem on my build system, so
I'll fix that and try again.

I'd like to eliminate these failures in some way so that we can expect
to see all builds regularly succeeding.  If failures are rare, then I
think we can be more sure that regressions are noticed soon after they
happen.

Comments, suggestions, and help are all welcome.

Thanks,

jwe


Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Olaf Till-2
On Wed, Mar 15, 2017 at 05:06:34PM -0400, John W. Eaton wrote:
> And finally, the mxe-native-on-debian build is failing because the
> of-database package configure script is failing with the following message:
>
>   checking for PQconnectdb in -lpq... no
>   configure: error: unable to find the PQconnectdb() function in pq

At this point the pg_config binary should already have been found, so
it seems you have postgresql installed. Postgresql normally comes with
libpq...

Olaf

> but I think that's just a missing package problem on my build system, so
> I'll fix that and try again.

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

John W. Eaton
Administrator
On 03/16/2017 02:02 AM, Olaf Till wrote:

> On Wed, Mar 15, 2017 at 05:06:34PM -0400, John W. Eaton wrote:
>> And finally, the mxe-native-on-debian build is failing because the
>> of-database package configure script is failing with the following message:
>>
>>   checking for PQconnectdb in -lpq... no
>>   configure: error: unable to find the PQconnectdb() function in pq
>
> At this point the pg_config binary should already have been found, so
> it seems you have postgresql installed. Postgresql normally comes with
> libpq...
>
> Olaf
>
>> but I think that's just a missing package problem on my build system, so
>> I'll fix that and try again.

Oh, I just realized that since this is an mxe-octave build, it's not a
missing dependency problem that should be solved by installing something
on the build system, but it's something that should be fixed in
mxe-octave or the package configure script.

I see that the postgresql package has been installed by mxe-octave and
that the database package configure script finds pg_config.  But the
-lpq library is not found, even though it is installed (in the
mxe-octave usr/lib directory.  The config.log file says

configure:3340: checking for PQconnectdb in -lpq
configure:3365: gcc -o conftest -g -O2   conftest.c -lpq   >&5
/usr/bin/ld: cannot find -lpq
collect2: error: ld returned 1 exit status

This build is using the system compiler which doesn't search the
mxe-octave directories by default.  So compiler flags like CPPFLAGS,
CFLAGS, CXXFLAGS, FFLAGS, and LDFLAGS should really be set so that
libraries compiled and installed by mxe-octave should be found instead
of the ones installed in system directories.

I think it should be working for the next build that is attempted
because I installed the libpq-dev package on that buildbot worker
system, but that's not really the right fix.

To properly test this kind of build, I think we need to do it in a VM or
on some other system that has a minimal set of development dependencies
installed.  I don't think it's anything that you need to fix in the
database package.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Carnë Draug
In reply to this post by John W. Eaton
On 15 March 2017 at 21:06, John W. Eaton <[hidden email]> wrote:
> [...]
> 8 tests are failing because there is no audio device.  I could skip those
> tests if there were a way to check whether audio devices are available, but
> I'm not sure how to do that.  Currently, audiodevinfo throws an error if
> there is no audio device.  Would it be OK to change that so it simply
> returns an empty structure?  For my purposes that would make checking
> easier, but I'm not sure whether it will cause other trouble.
> [...]

Returning an empty struct makes sense to me.  However, audiodevinfo is
a Matlab function so I guess we should do whatever Matlab does.

I tried to test it (in Matlab R2010b) but got an error back saying
that it only works on Windows.

Carnë

Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Olaf Till-2
In reply to this post by John W. Eaton
On Thu, Mar 16, 2017 at 10:37:04AM -0400, John W. Eaton wrote:
> To properly test this kind of build, I think we need to do it in a VM or on
> some other system that has a minimal set of development dependencies
> installed.  I don't think it's anything that you need to fix in the database
> package.

Ok. But if you should use the released version of the package, I'd
recommend using latest tip. Otherwise you may run into a bug in the
Makefile which probably affects cross building (-L option for pq set
_before_ -L options for Octave libs in the g++ commandline). If
necessary, I can make a new release.

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Dmitri A. Sergatskov
In reply to this post by John W. Eaton
On Wed, Mar 15, 2017 at 4:06 PM, John W. Eaton <[hidden email]> wrote:
​<...>​


The gcc-6-lto-debian build fails because of a mysterious problem saving and loading signed 32-bit integers to HDF5 files.
<...>

John,

I think the way buldbot slaves are configured to run lto optimization is not quite right.
One needs to set AR, NM, and RANLIB to gcc-ar, gcc-nm, and gcc-ranlib correspondingly.
The reason it is kind of works now is the lfat-lto-object preserves non-lto symbols which
nm/ar/ranlib would use. Perhaps this lead to some obscure problem like failing HDF5 test.

In any case I just build octave with

../configure CFLAGS="-flto=4" CXXFLAGS="-flto=4" FFLAGS="-flto=4" LDFLAGS="-flto=4" NM="gcc-nm" AR="gcc-ar" RANLIB="gcc-ranlib"

and it works fine and passes all the tests. (I am not sure if LDFLAGS is really needed).
So you may want to adjust flags accordingly.

Also -- clang's bug not linking graphicsmagick is gone (at least on Fedora), so the
--without-magick in clang configure could be dropped as well.


jwe



​Sincerely,

Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Dmitri A. Sergatskov
On Wed, Apr 12, 2017 at 1:35 AM, Dmitri A. Sergatskov <[hidden email]> wrote:
On Wed, Mar 15, 2017 at 4:06 PM, John W. Eaton <[hidden email]> wrote:
​<...>​


The gcc-6-lto-debian build fails because of a mysterious problem saving and loading signed 32-bit integers to HDF5 files.
<...>

John,

I think the way buldbot slaves are configured to run lto optimization is not quite right.
One needs to set AR, NM, and RANLIB to gcc-ar, gcc-nm, and gcc-ranlib correspondingly.
The reason it is kind of works now is the lfat-lto-object preserves non-lto symbols which
nm/ar/ranlib would use. Perhaps this lead to some obscure problem like failing HDF5 test.

In any case I just build octave with

../configure CFLAGS="-flto=4" CXXFLAGS="-flto=4" FFLAGS="-flto=4" LDFLAGS="-flto=4" NM="gcc-nm" AR="gcc-ar" RANLIB="gcc-ranlib"

and it works fine and passes all the tests. (I am not sure if LDFLAGS is really needed).
So you may want to adjust flags accordingly.

​I guess I spoke too soon here. I still get the HDF5 fail if I set the flags
to "-O2 -flot=4".

I also noticed in lto compile the following warning (which I do not see in gcc default compile):

link: g++ -fPIC -pthread -fopenmp -Wall -W -Wshadow -Wold-style-cast -Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -O2 -flto=4 -ffat-lto-objects -o src/octave-config src/src_octave_config-octave-config.o  libinterp/corefcn/.libs/libcorefcn.a libgnu/.libs/libgnu.a -lutil -lm -fopenmp -pthread
../src/liboctave/system/mach-info.cc:34:10: warning: type of 'd1mach_' does not match original declaration [-Wlto-type-mismatch]
   double F77_FUNC (d1mach, D1MACH) (const octave_idx_type&);
          ^
../src/liboctave/cruft/misc/d1mach.f:1:14: note: 'd1mach' was previously declared here
       double precision function d1mach (i)
              ^
../src/liboctave/cruft/misc/d1mach.f:1:14: note: code may be misoptimized unless -fno-strict-aliasing is used
libtool: link: (cd "liboctave/.libs" && rm -f "liboctave.so.4" && ln -s "liboctave.so.4.0.0" "liboctave.so.4")
​Not sure how important this is, and if test failure has anything to do with that.​



Also -- clang's bug not linking graphicsmagick is gone (at least on Fedora), so the
--without-magick in clang configure could be dropped as well.


jwe



​Sincerely,

Dmitri.
--


Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

John W. Eaton
Administrator
On 04/12/2017 03:20 AM, Dmitri A. Sergatskov wrote:

> On Wed, Apr 12, 2017 at 1:35 AM, Dmitri A. Sergatskov
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On Wed, Mar 15, 2017 at 4:06 PM, John W. Eaton <[hidden email]
>     <mailto:[hidden email]>> wrote:
>     ​<...>​
>
>
>         The gcc-6-lto-debian build fails because of a mysterious problem
>         saving and loading signed 32-bit integers to HDF5 files.
>         <...>
>
>     ​
>     John,
>     ​
>     I think the way buldbot slaves are configured to run lto
>     optimization is not quite right.
>     One needs to set AR, NM, and RANLIB to gcc-ar, gcc-nm, and
>     gcc-ranlib correspondingly.
>     The reason it is kind of works now is the lfat-lto-object preserves
>     non-lto symbols which
>     nm/ar/ranlib would use. Perhaps this lead to some obscure problem
>     like failing HDF5 test.
>
>     In any case I just build octave with
>
>     ../configure CFLAGS="-flto=4" CXXFLAGS="-flto=4" FFLAGS="-flto=4"
>     LDFLAGS="-flto=4" NM="gcc-nm" AR="gcc-ar" RANLIB="gcc-ranlib"
>
>     and it works fine and passes all the tests. (I am not sure if
>     LDFLAGS is really needed).
>     So you may want to adjust flags accordingly.

I will add the configure flags in the buildbot config.

> ​I guess I spoke too soon here. I still get the HDF5 fail if I set the
> flags
> to "-O2 -flot=4".

Hmm, OK.I will add the configure flags in the buildbot config.

> I also noticed in lto compile the following warning (which I do not see
> in gcc default compile):
>
> ​
> ​
>
> link: g++ -fPIC -pthread -fopenmp -Wall -W -Wshadow -Wold-style-cast
> -Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -O2
> -flto=4 -ffat-lto-objects -o src/octave-config
> src/src_octave_config-octave-config.o
> libinterp/corefcn/.libs/libcorefcn.a libgnu/.libs/libgnu.a -lutil -lm
> -fopenmp -pthread ../src/liboctave/system/mach-info.cc:34:10: warning:
> type of 'd1mach_' does not match original declaration
> [-Wlto-type-mismatch] double F77_FUNC (d1mach, D1MACH) (const
> octave_idx_type&); ^ ../src/liboctave/cruft/misc/d1mach.f:1:14: note:
> 'd1mach' was previously declared here double precision function d1mach
> (i) ^ ../src/liboctave/cruft/misc/d1mach.f:1:14: note: code may be
> misoptimized unless -fno-strict-aliasing is used libtool: link: (cd
> "liboctave/.libs" && rm -f "liboctave.so.4" && ln -s
> "liboctave.so.4.0.0" "liboctave.so.4")
>
> ​
> ​Not sure how important this is, and if test failure has anything to do
> with that.​

I fixed the declaration here:

http://hg.savannah.gnu.org/hgweb/octave/rev/41dcc5c1e41f

>     Also -- clang's bug not linking graphicsmagick is gone (at least on
>     Fedora), so the
>     --without-magick in clang configure could be dropped as well.

OK, I will update the options in the buildbot config.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Dmitri A. Sergatskov
On Wed, Apr 12, 2017 at 9:34 AM, John W. Eaton <[hidden email]> wrote:

​I guess I spoke too soon here. I still get the HDF5 fail if I set the
flags
to "-O2 -flot=4".

Hmm, OK.I will add the configure flags in the buildbot config.

​Do you want keep "fat-lto-objects " for some reason? ​
​I also fond out that you do not need LDFLAG​, and io.tst would pass with -O1,
but fail with -Os or -O2.

So, from my point of view



../src/configure 'CFLAGS=-O2 -flto=4' 'CXXFLAGS=-O2 -flto=4' 'FFLAGS=-O2 -flto=4
​'​
​'​
NM=gcc-nm
​'​
​'​
AR=gcc-ar
​'​
​'​
RANLIB=gcc-ranlib
​'​

jwe

​Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

John W. Eaton
Administrator
On 04/12/2017 07:23 PM, Dmitri A. Sergatskov wrote:

> On Wed, Apr 12, 2017 at 9:34 AM, John W. Eaton <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>         ​I guess I spoke too soon here. I still get the HDF5 fail if I
>         set the
>         flags
>         to "-O2 -flot=4".
>
>
>     Hmm, OK.I will add the configure flags in the buildbot config.
>
>
> ​Do you want keep "fat-lto-objects " for some reason? ​
> ​I also fond out that you do not need LDFLAG​, and io.tst would pass
> with -O1,
> but fail with -Os or -O2.
>
> So, from my point of view
>
>
>
> ../src/configure 'CFLAGS=-O2 -flto=4' 'CXXFLAGS=-O2 -flto=4' 'FFLAGS=-O2
> -flto=4
> ​'​
> ​'​
> NM=gcc-nm
> ​'​
> ​'​
> AR=gcc-ar
> ​'​
> ​'​
> RANLIB=gcc-ranlib
> ​'​

I had to look up what fat-lto-objects are.  I guess we don't need them
for these builds since we know that the tools are present to support LTO.

If the test passes with one level of optimization but not another, are
we doing something wrong or is this a compiler/linker bug?

jwe


Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Dmitri A. Sergatskov
On Thu, Apr 13, 2017 at 8:58 AM, John W. Eaton <[hidden email]> wrote:


If the test passes with one level of optimization but not another, are we doing something wrong or is this a compiler/linker bug?

​I guess it is compiler bug -- it was working fine on my machine at first, but after some updates it runs into this problem.
I also tried gcc 7.0.1 and the bug is still there.   ​But it could be some obscure HDF5 lib bug or octave bug that get exposed by
high level of optimization. I tried to debug it a little, but I did not get too far (yet). I can tell that both reads and writes of large (sign)
integers are broken. (I.e. octave w/ bug does not read correctly HDF5 written by octave w/o bug). If it is a compiler bug it is kind of
strange that this is the only thing affected -- I am actually using this version of octave for my "everyday stuff" hoping to see if I hit
some other problem, but so far everything looks fine.
 

jwe


​Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: buildbot test failures

Dmitri A. Sergatskov
​I compile a recent tip with

 ../configure "CFLAGS=-O0 -fsanitize=undefined -flto=4" "CXXFLAGS=-O0 -fsanitize=undefined -flto=4" "FFLAGS=-O0 -fsanitize=undefined -flto=4" "AR=gcc-ar" "NM=gcc-nm" "RANLIB=gcc-ranlib"

​then when I do test io:

​octave:1> test ../test/io.tst
../libinterp/corefcn/ls-mat5.cc:2418:12: runtime error: null pointer passed as argument 1, which is declared to never be null
../libinterp/corefcn/ls-mat5.cc:2419:13: runtime error: null pointer passed as argument 1, which is declared to never be null
PASSES 142 out of 142 tests

​Is this problem?

Dmitri.
--