Title: Clean running regression tests on windows
Type: behaviour Severity: normal
Components: Library Versions: Jython 2.7
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: adamburke, darjus, jeff.allen, zyasoft
Priority: normal Keywords: patch

Created on 2015-09-10.15:10:11 by adamburke, last changed 2015-11-10.16:28:35 by zyasoft.

File name Uploaded Description Edit Remove
win_test_cleanup_2393.diff adamburke, 2015-09-10.15:12:01
README_2393.diff adamburke, 2015-10-18.12:40:33
win_test_cleanup_2393_2.diff adamburke, 2015-10-26.13:39:32
msg10246 (view) Author: Adam Burke (adamburke) Date: 2015-09-10.15:10:10
I found it difficult to provide useful regression test datapoints when the 2.7.0 release candidate builds were coming out. This small patch marks current failures and skips as expected. It also creates a more detailed memo file, including eg OS and Java version. A trivial batch script for running it under Windows is included for user convenience.

I'm hoping this can expand release candidate regression coverage by making the process of testing a new build on a platform lower cost. Download build, run jython_regrtest.bat, and if no unexpected failures, that gives good cnfidence. If failures, mail the resulting regrtest_memo.txt or excerpt to jython-dev or jython-users.

Certainly it would be even better to fix the failing tests outright, but I haven't built that expertise on the codebase. I realise build/test can be close to a developer's heart, and this is an unusual place for a first patch, but perhaps this suggested patch can start discussion. If it is found useful, note that I didn't have other platforms to test on, so they could still show failures. The same cleanup principle could be applied there.
msg10247 (view) Author: Adam Burke (adamburke) Date: 2015-09-10.15:12:01
Patch attached.
msg10313 (view) Author: Darjus Loktevic (darjus) Date: 2015-10-07.09:49:31
Hey Adam, have you signed the Python/Jython contributor agreements?

I like the idea of regrtest cleanup.
msg10342 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-09.17:29:18
A clean-running regrtest is a worthwhile objective. And also the idea that users could help us test more thoroughly on more platforms.

I applied this patch to my current repo tip and it makes regrtest run without reporting errors, as advertised -- a new experience :).

Tests are suppressed in the patch that I don't have any trouble with. When I drop the main expected failure list additions into a file "my.tests" I see:

> dist\bin\jython dist\Lib\test\ -f my.tests
23 tests OK.
10 tests failed:
    test_file2k test_httpservers test_netrc test_runpy test_shutil
    test_subprocess_jy test_sys test_tarfile test_urllib2 test_zipfile
23 tests passed unexpectedly:
    test___all__ test_bytes test_bz2 test_classpathimporter
    test_import test_import_pep328 test_java_integration
    test_jython_initializer test_list_jy test_logging test_marshal
    test_os_jy test_select test_socket test_sort test_ssl test_str
    test_string test_sys_jy test_unicode test_univnewlines
    test_userstring test_zipimport_jy

My repo is has a few extra fixes, but I doubt I have fixed that many things! So I would not include those unexpected passes. Tests that report skips are not failures (except morally ;). Tests skipped wholesale, can be "expected" separate from the expected failures.

I think the trivial batch script may be overkill: people will surely only run it if we ask (e.g. suggesting it in README.TXT), and we might as well give them a command to type/paste there. If we have jython_regrtest.bat, then I believe SETLOCAL will prevent JY_HOME leaking into the user's environment.

Is this just a typo?
-                            "test_applesingle"]
+                            "test_applesingle.pyingle"]

The thing bothering me, and it may just need getting used to, is that it's so quiet. How do we make regrtest nag *us* continually about things that need fixing, while only complaining (through our users or a build bot) about things that we didn't know? Skips in the test conceal less.

My recommendations (others free to argue):
1. Look into suppressing fewer tests. (Do things actually fail, or just contain skips?)
2. Fix: test_applesingle.pyingle
3. Add an invitation to README.TXT rather adding jython_regrtest.bat.
4. Add your name to ACKNOWLEDGEMENTS!
msg10349 (view) Author: Adam Burke (adamburke) Date: 2015-10-11.12:37:52
Hi Darjus - I was unaware of the contributor agreement, but have signed it today (electronically) so it should be sortd now.
msg10358 (view) Author: Darjus Loktevic (darjus) Date: 2015-10-18.01:46:20
Hey Adam, Jeff makes good point. I've committed most of your patch, except the skip bits. I think there's a good use for the better messaging and the bat file.
msg10360 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-18.08:25:16
Thanks Darjus. I have a bit of time free again for this, but you beat me to it.

I think Adam's idea to democratise testing is a good one: giving people a test invocation we believe runs cleanly every time. I'd prefer to skip the specific failing test cases to maximise the cover.

The tricky part is tests that do not seem to pass repeatably. These should nag us, I believe, but not run for the general public. Environment variable from the script maybe?

Adam: I added you to ACKNOWLEDGEMENTS when I committed your fix for #2396. I'm sure we'd welcome more help if you're able to give it.
msg10361 (view) Author: Adam Burke (adamburke) Date: 2015-10-18.12:40:33
Hi Jeff, Darjus, glad it looks useful.

I had started to fold in Jeff's comments locally but you beat me to it.

First simple thing - test_applesingle.pyingle is indeed a typo, probably a stray cut and paste, and so that line should be reverted.

Batch file - happy for you guys to make a call, it doesn't do much. I see Darjus has included it. Change to README along Jeff's suggested lines attached if you want to use it

To test new versions by running the regression tests, run jython_regrtest.bat or

jython -m test.regrtest -e -m regrtest_memo.txt

(The memo file regrtest_memo.txt will be useful in the bug report if
 you see test failures or other bugs)
msg10362 (view) Author: Adam Burke (adamburke) Date: 2015-10-18.12:53:39
On the meatier issue of exactly which tests are included and excluded, yes the idea is to make it easier for more people to test by giving a clean red or green result. This should also make it easier to spot regression problems for frequent developers and hopefully reduce stress around release candidates :)

The exact tests I included were just because they were the ones that failed on my laptop. When I run the command line Jeff uses I do get unexpected passes now and I agree we shouldn't. I still don't get the same set of passes and fails though and maybe we can use this thread to refine them.

A process of methodically knocking out the skips that aren't genuinely system driven then seemed the next step. I was hoping to keep plodding along on that, hopefully with a codebase learning side effect.

My current test state for trunk

231 tests OK.
14 tests skipped:
    test__rawffi test_closuregen test_descr test_largefile
    test_longexp test_mhlib test_poll test_posix test_py3kwarn
    test_socketserver test_struct test_subprocess test_urllib2net
6 tests ran unexpectedly:
    test_asynchat test_asyncore test_cmd_line_script test_io
    test_select_new test_signal
17 tests failed:
    test_asynchat test_asyncore test_file2k test_httpservers
    test_logging test_netrc test_os_jy test_runpy test_shutil
    test_signal test_socket test_subprocess_jy test_sys test_sys_jy
    test_tarfile test_urllib2 test_zipfile
16 tests passed unexpectedly:
    test_bytes test_classpathimporter test_gc test_import
    test_import_pep328 test_java_integration test_jython_initializer
    test_list_jy test_marshal test_multibytecodec_support test_select
    test_sort test_ssl test_str test_unicode test_zipimport_jy
3 fails unexpected:
    test_asynchat test_asyncore test_signal


So for the next iteration of this I will check they run both standalone and in a pack and put back the 16 passing tests.
msg10363 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-19.15:24:20
I'm puzzled I'm not getting nearly as many failures as Adam. I pulled a completely clean repo updated to Darjus' last changeset: .

For comparison I then get this memo from jython_regrtest.bat:

6 tests skipped:
    test_curses test_smtpnet test_socketserver test_subprocess
    test_urllib2net test_urllibnet
13 tests failed:
    test_file2k test_httpservers test_netrc test_os test_runpy
    test_select test_shutil test_subprocess_jy test_sys test_sys_jy
    test_tarfile test_urllib2 test_zipfile
13 fails unexpected:
    test_file2k test_httpservers test_netrc test_os test_runpy
    test_select test_shutil test_subprocess_jy test_sys test_sys_jy
    test_tarfile test_urllib2 test_zipfile
Command line: 
    ['test.regrtest', '-e', '-m', 'regrtest_memo.txt']

I couldn't get test_os to fail on its own: I think the failure above is due to a left-over file from a previous failed test.
msg10381 (view) Author: Adam Burke (adamburke) Date: 2015-10-26.13:39:32
New patch. Some of the tests I excluded as fails should have been excluded as skips. This plus some actual fixes going on in the codebase means no unexpected passes any more.

Also added some commentary around tests that only fail intermittently. 

Stefan separately noted that test_gc fails only very rarely and he is looking into it so I included in commented out form. It has only failed for me once from memory.

Latest without -e

367 tests OK.
94 tests skipped:
    test__locale test__osx_support test__rawffi test_aepack test_al
    test_applesingle test_ascii_formatd test_audioop test_bsddb
    test_bsddb185 test_bsddb3 test_capi test_cd test_cl
    test_closuregen test_codecmaps_cn test_codecmaps_hk
    test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw
    test_commands test_cprofile test_ctypes test_curses test_dbm
    test_descr test_dl test_dummy_threading test_epoll test_fcntl
    test_fork1 test_gdb test_gdbm test_getargs2 test_gl test_grp
    test_hotshot test_imageop test_imgfile test_ioctl test_kqueue
    test_largefile test_lib2to3 test_linuxaudiodev test_longexp
    test_macos test_macostools test_mhlib test_mmap test_modulefinder
    test_msilib test_multibytecodec test_multiprocessing test_nis
    test_openpty test_ossaudiodev test_parser test_pep277 test_pipes
    test_poll test_posix test_pty test_pwd test_py3kwarn test_readline
    test_resource test_sax test_scriptpackages test_smtpnet
    test_socketserver test_sqlite test_startfile test_strop
    test_struct test_structmembers test_subprocess test_sunaudiodev
    test_symtable test_tcl test_timeout test_tk test_tools
    test_ttk_guionly test_ttk_textonly test_ucn test_unicode_file
    test_urllib2net test_urllibnet test_wait3 test_wait4 test_wave
    test_winreg test_winsound test_zipfile64
25 tests ran unexpectedly:
    test___all__ test_asynchat test_asyncore test_cmd_line_script
    test_compileall test_distutils test_email_codecs test_ftplib
    test_httplib test_io test_locale test_logging test_plistlib
    test_poplib test_profile test_pydoc test_select test_select_new
    test_signal test_smtplib test_sundry test_sys_setprofile
    test_sys_settrace test_telnetlib test_threading
48 tests failed:
    test_asynchat test_asyncore test_codecencodings_cn
    test_codecencodings_hk test_codecencodings_iso2022
    test_codecencodings_jp test_codecencodings_kr test_compileall
    test_compiler test_dis test_distutils test_email_codecs test_eof
    test_file2k test_frozen test_ftplib test_httplib test_httpservers
    test_iterlen test_locale test_logging test_mailbox test_netrc
    test_os_jy test_peepholer test_poplib test_profile test_pyclbr
    test_pydoc test_pyexpat test_runpy test_shutil test_signal
    test_smtplib test_socket test_stringprep test_subprocess_jy
    test_sundry test_sys test_sys_jy test_sys_setprofile
    test_sys_settrace test_tarfile test_threadsignals test_transformer
    test_urllib2 test_zipfile test_zipimport
17 fails unexpected:
    test_asynchat test_asyncore test_compileall test_distutils
    test_email_codecs test_ftplib test_httplib test_locale
    test_logging test_poplib test_profile test_pydoc test_signal
    test_smtplib test_sundry test_sys_setprofile test_sys_settrace

With -e

359 tests OK.
6 tests skipped:
    test_curses test_smtpnet test_socketserver test_subprocess
    test_urllib2net test_urllibnet

bad = [
skipped = [
6 tests skipped:
    test_curses test_smtpnet test_socketserver test_subprocess
    test_urllib2net test_urllibnet
0 tests failed:

msg10406 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-29.21:09:48

I'd like to commit this (almost) as is, in your name, as a quick way to quieten the tests. So thanks for that. Also, you're right: some of these, like lib2to3, are skips not failures.

Afterwards we should look for more specific ways of omitting the failing parts of tests instead: e.g. where only some cases in a test fail or only fail on Windows. At the same time, we open any new issues we need.

My "almost" means with the following differences (which I'll take care of):
1. Not the change to README.txt as it is basically what I already added.
2. We don't need to include in _expectations tests skipped on a denied resource (--use flag) and the codecmaps_* tests are failures for me.

At some stage the project seems to have thought the _expectations variable listed expected failures. I know that coming late to this, I didn't really understand this code until I re-worked some of it. It would be worth challenging entries one at a time to get that right. When I force them to run, most are skips, but some are failures (and a few are passes).
msg10419 (view) Author: Adam Burke (adamburke) Date: 2015-10-30.15:46:35
Sounds good.
msg10423 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-31.08:18:36
This is your (Adam's) patch: (almost). And this: sorts tests carefully between _expectations (skips) and _failures. This runs cleanly for me on Windows:

>ant regrtest
     [exec] 371 tests OK.
     [exec] 1 test skipped:
     [exec]     test_curses

The skip is not "unexpected", so I think that's ok.

With #2419 also in place (which makes it 370 tests BTW), we should turn some of the _failures and maybe some _expectations into a tracked issue. In many cases, I'm not sure in many cases whether a given failure is really a defect, as opposed to something we never expect of Jython. But we can start with the easy ones.
Date User Action Args
2015-11-10 16:28:35zyasoftsetstatus: pending -> closed
2015-10-31 08:18:37jeff.allensetmessages: + msg10423
2015-10-30 15:46:35adamburkesetmessages: + msg10419
2015-10-29 21:09:49jeff.allensetmessages: + msg10406
2015-10-26 13:39:34adamburkesetfiles: + win_test_cleanup_2393_2.diff
messages: + msg10381
2015-10-19 15:24:22jeff.allensetmessages: + msg10363
2015-10-18 12:53:40adamburkesetmessages: + msg10362
2015-10-18 12:40:35adamburkesetfiles: + README_2393.diff
messages: + msg10361
2015-10-18 08:25:17jeff.allensetstatus: open -> pending
resolution: accepted
messages: + msg10360
2015-10-18 01:46:21darjussetmessages: + msg10358
2015-10-11 12:37:52adamburkesetmessages: + msg10349
2015-10-09 17:29:19jeff.allensetpriority: normal
assignee: jeff.allen
messages: + msg10342
nosy: + jeff.allen, zyasoft
2015-10-07 09:49:32darjussetnosy: + darjus
messages: + msg10313
2015-09-10 15:12:03adamburkesetfiles: + win_test_cleanup_2393.diff
keywords: + patch
messages: + msg10247
2015-09-10 15:10:11adamburkecreate