Title: List expected failures by OS platform in
Type: behaviour Severity: normal
Components: Library Versions: Jython 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: adamburke, darjus, fwierzbicki, jeff.allen, stefan.richthofer, zyasoft
Priority: normal Keywords: test failure causes

Created on 2015-10-30.08:56:33 by jeff.allen, last changed 2018-03-14.18:38:59 by jeff.allen.

msg10417 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-30.08:56:32
We'd like to have clean-running regression tests but have never (?) managed it.

We like using CPython tests by default, with necessary exclusions and variations. With those aims, we have added to a 'java' platform, extending the catalogue (_expectations) of tests it expects to raise unittest.SkipTest, and adding a catalogue (_failures) of tests it expects to fail.

But, the catalogues tagged simply 'java' in do not capture the variation in test success amongst OS platforms. In this scheme, if we want clean-running regression tests for everyone, we have to add to _failures, every test that fails on any OS platform. A recent shot in issue #2393 at getting clean regression tests on Windows ended by making these lists only correct for Windows (maybe just Cygwin). Some code in tries to accommodate OS variation, but it is hard to read and may be CPython-specific. This cannot be maintained reliably from any one OS.

I propose we add optional sections with keys like 'java.nt', where after the dot is taken from os._name, to catalogue those things applicable to that OS (taken from os._name). The key 'java' contains those things applicable to all Jython OSs. It is perhaps only necessary to do this to _failures. The OS-specific sections ought to be short, or something is wrong.

Good idea?
msg10418 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2015-10-30.15:12:49
First: I really appreciate this topic gets some traction!
Your suggestion for OS-wise sections sounds good to me as a midterm solution. For short-term I would propose to more rigorously sort tests into stable vs. unstable, so that regrtests get reliable again (and quickly).
From that position we can step by step fix unstable tests or sort them into OS-specific sections, keeping regrtests reliable during the whole transition period.
msg10422 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-31.08:16:46
I've done this now for Windows, adding a 'java.nt' key to _failures.

I don't know how we would "rigorously sort tests into stable vs. unstable". I saw test_glob fail yesterday, out of the blue, and I couldn't repeat it. Does that make it unstable? (It was just an unlink() failure, so I made that non-fatal.)

If we want repeatable tests we should err on the side of expecting as failures (or skipping in the module) those we find unreliable, but not without converting that choice to an issue.
msg10616 (view) Author: Jim Baker (zyasoft) Date: 2016-01-11.04:03:35
See also CPython: - mostly stable, but not always, and dependent on platform.

I think we are now in pretty good shape, with the exception of test_ssl which runs stably outside of the regrtest, but not in it. Something we can look at, maybe for 2.7.1 RC.
msg11798 (view) Author: Jeff Allen (jeff.allen) Date: 2018-03-14.18:38:59
This is our common practice now we have the means in, so I'm closing as "fixed". However, it's worth saying that it is always better to skip individual failing tests than entire modules.
Date User Action Args
2018-03-14 18:38:59jeff.allensetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg11798
2016-01-11 04:03:35zyasoftsetassignee: jeff.allen ->
messages: + msg10616
nosy: + darjus
2015-10-31 08:16:47jeff.allensetassignee: jeff.allen
resolution: accepted
messages: + msg10422
2015-10-30 15:12:49stefan.richthofersetmessages: + msg10418
2015-10-30 08:56:33jeff.allencreate