Test suite regressions vs expected failures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Test suite regressions vs expected failures

Mike Miller-4
Rik,

You may have missed the discussion about a month ago [1], but there are
four tests that you added to the test script "for.tst" that are failing
with the new "regression" status. You added these tests as part of the
fix for bug #50893 [2].

I'm not completely sure what the intent was, were these tests supposed
to pass as part of the bug fix? Or was your intent to add them as xtests
that fail because of behavior we are intentionally not following (a test
of a "won't fix" feature)?

In [1] we didn't really come to any conclusions about whether we want to
add tests for things that are "won't fix", or whether to mark them
somehow so they aren't flagged as regressions. But that's how they look
for now.

[1]: https://lists.gnu.org/archive/html/octave-maintainers/2017-07/msg00040.html
[2]: https://savannah.gnu.org/bugs/?50893

--
mike

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Test suite regressions vs expected failures

Rik-4
On 08/16/2017 05:45 PM, Mike Miller wrote:
> Rik,
>
> You may have missed the discussion about a month ago [1], but there are
> four tests that you added to the test script "for.tst" that are failing
> with the new "regression" status. You added these tests as part of the
> fix for bug #50893 [2].

When I added the tests they were not tagged as regressions, just regular
tests.  The '*' tag should be removed from these BIST tests.

>
> I'm not completely sure what the intent was, were these tests supposed
> to pass as part of the bug fix? Or was your intent to add them as xtests
> that fail because of behavior we are intentionally not following (a testb
> of a "won't fix" feature)?
>
> In [1] we didn't really come to any conclusions about whether we want to
> add tests for things that are "won't fix", or whether to mark them
> somehow so they aren't flagged as regressions. But that's how they look
> for now.
>
> [1]: https://lists.gnu.org/archive/html/octave-maintainers/2017-07/msg00040.html
> [2]: https://savannah.gnu.org/bugs/?50893
>
For bug #50893 we need to decide what level of Matlab compatibility we are
striving for.  The tests in for.tst contain actual expected results as
determined under Matlab.  For example, this test

-- Code --
%! cnt = 0;
%! for k = zeros (0,3);
%!   cnt++;
%! endfor
%! assert (cnt, 0);
%! assert (k, zeros (0,1));
-- End Code --

Both Octave and Matlab now create an empty variable k which is the more
important thing. Of lesser importance, but up for discussion here on the
Maintainer's List, is whether we want exact compatibility.  Octave executes
the initialization statement exactly as written in the for loop so the
variable k is of size 0x3.  Matlab creates the empty variable k but the
size is 0x1.

--Rik  



Reply | Threaded
Open this post in threaded view
|

Re: Test suite regressions vs expected failures

Mike Miller-4
On Sat, Aug 19, 2017 at 22:24:51 -0700, Rik wrote:
> When I added the tests they were not tagged as regressions, just regular
> tests.  The '*' tag should be removed from these BIST tests.

Right, that marking didn't even exist at the time, it was added later by
jwe because the bug is marked as fixed.

So I now understand that the test was added to reflect a portion of the
bug that wasn't fixed, even though the bug is closed and marked fixed.

> For bug #50893 we need to decide what level of Matlab compatibility we are
> striving for.  The tests in for.tst contain actual expected results as
> determined under Matlab.  For example, this test
>
> -- Code --
> %! cnt = 0;
> %! for k = zeros (0,3);
> %!   cnt++;
> %! endfor
> %! assert (cnt, 0);
> %! assert (k, zeros (0,1));
> -- End Code --
>
> Both Octave and Matlab now create an empty variable k which is the more
> important thing. Of lesser importance, but up for discussion here on the
> Maintainer's List, is whether we want exact compatibility.  Octave executes
> the initialization statement exactly as written in the for loop so the
> variable k is of size 0x3.  Matlab creates the empty variable k but the
> size is 0x1.
I think Octave is compatible enough, and arguably more correct here, so
I would change the last line of the test to either

    %! assert (k, zeros (0,3));

or

    %! assert (isempty (k))

--
mike

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Test suite regressions vs expected failures

Mike Miller-4
In reply to this post by Rik-4
On Sat, Aug 19, 2017 at 22:24:51 -0700, Rik wrote:
> When I added the tests they were not tagged as regressions, just regular
> tests.  The '*' tag should be removed from these BIST tests.

To be clear, the '*' tag does not mark the test as a regression. The '*'
tag is supposed to mean that the bug number has been marked as fixed on
the bug tracker. No '*' means that the bug is still open. If I see a
test marked with "<12345>", that means there may still be work to be
done.

If a test fails it is flagged as either an expected failure or a
regression, depending on whether we thought the bug was fixed or not.

Any new tests that are added for bugs when they are fixed should include
the '*' marker.

--
mike

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Test suite regressions vs expected failures

John W. Eaton
Administrator
On 08/21/2017 08:02 PM, Mike Miller wrote:

> On Sat, Aug 19, 2017 at 22:24:51 -0700, Rik wrote:
>> When I added the tests they were not tagged as regressions, just regular
>> tests.  The '*' tag should be removed from these BIST tests.
>
> To be clear, the '*' tag does not mark the test as a regression. The '*'
> tag is supposed to mean that the bug number has been marked as fixed on
> the bug tracker. No '*' means that the bug is still open. If I see a
> test marked with "<12345>", that means there may still be work to be
> done.
>
> If a test fails it is flagged as either an expected failure or a
> regression, depending on whether we thought the bug was fixed or not.
>
> Any new tests that are added for bugs when they are fixed should include
> the '*' marker.

Yes, so the problem is that these tests are tagged with a report number
that has been closed as fixed.

If we intend to fix these problems, then either the bug report should
not be closed as fixed, or we need a new report that is not closed.

If we close the report as "won't fix", then should we have tests that
fail forever?  Or should we have another way to flag them in the tests
so that they are reported as "issues we know about but don't plan to
fix"?  That was the question I was originally trying to ask.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Test suite regressions vs expected failures

Mike Miller-4
On Tue, Aug 22, 2017 at 11:15:34 -0400, John W. Eaton wrote:
> If we close the report as "won't fix", then should we have tests that fail
> forever?  Or should we have another way to flag them in the tests so that
> they are reported as "issues we know about but don't plan to fix"?  That was
> the question I was originally trying to ask.

Yeah, or in this case, a bug that fixed some things and allowed other
things to remain unfixed.

I think we should be consistent with the markings, so in theory the
source tree can be scraped for bug numbers, the bug tracker can be
scraped for the corresponding reports, and they will all be either open
(no '*' marking) or closed as fixed (with '*' marking).

We could add another marking like '!' to indicate a bug that shows some
behavior that we are not going to fix, but are intentionally adding a
test anyway to show that we are not compatible.

But since the test will always fail, what does that show? Should the
test suite report a problem if a test marked "won't fix" actually passes
instead?

--
mike

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Test suite regressions vs expected failures

John W. Eaton
Administrator
On 08/22/2017 02:00 PM, Mike Miller wrote:

> I think we should be consistent with the markings, so in theory the
> source tree can be scraped for bug numbers, the bug tracker can be
> scraped for the corresponding reports, and they will all be either open
> (no '*' marking) or closed as fixed (with '*' marking).

I already added a Makefile target (update-bug-status) to mostly automate
this job.

> We could add another marking like '!' to indicate a bug that shows some
> behavior that we are not going to fix, but are intentionally adding a
> test anyway to show that we are not compatible.
>
> But since the test will always fail, what does that show? Should the
> test suite report a problem if a test marked "won't fix" actually passes
> instead?

I'm not sure exactly what to do, but it seems useful to me to somehow
note incompatibilities that we know about but don't intend to fix so
that we have some record of them.  That way we have some relatively easy
to find pointer to the discussion(s) that resulted in marking them as
"won't fix".

jwe



Reply | Threaded
Open this post in threaded view
|

Re: Test suite regressions vs expected failures

Carnë Draug
On 22 August 2017 at 19:49, John W. Eaton <[hidden email]> wrote:

> On 08/22/2017 02:00 PM, Mike Miller wrote:
> [...]
>> We could add another marking like '!' to indicate a bug that shows some
>> behavior that we are not going to fix, but are intentionally adding a
>> test anyway to show that we are not compatible.
>>
>> But since the test will always fail, what does that show? Should the
>> test suite report a problem if a test marked "won't fix" actually passes
>> instead?
>
>
> I'm not sure exactly what to do, but it seems useful to me to somehow note
> incompatibilities that we know about but don't intend to fix so that we have
> some record of them.  That way we have some relatively easy to find pointer
> to the discussion(s) that resulted in marking them as "won't fix".
>

This would be handy for the image package too, I have a few instances
like that.  The test fails, not because I know there's a bug that
needs fixing, but because I want it to fail on purpose.  I guess in
that case a %!error block makes the most sense, since that's the
expected and desired result.

Carnë