Issue2420

classification
Title: Regression tests leaking file handles on Linux
Type: crash Severity: normal
Components: Library Versions: Jython 2.7
Milestone: Jython 2.7.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: darjus, jeff.allen, stefan.richthofer, zyasoft
Priority: normal Keywords: test failure causes

Created on 2015-11-01.16:22:44 by jeff.allen, last changed 2016-02-27.22:38:15 by zyasoft.

Messages
msg10425 (view) Author: Jeff Allen (jeff.allen) Date: 2015-11-01.16:22:42
When I run "ant regrtest" on Linux, I get as far as test_tempfile, then it crashes with "Too many open files":

     [exec] Traceback (most recent call last):
     [exec]   File "/home/jeff/eclipse/jython-trunk/dist/Lib/test/regrtest.py", line 1544, in <module>
     [exec]   File "/home/jeff/eclipse/jython-trunk/dist/Lib/test/regrtest.py", line 415, in main
     [exec]   File "/home/jeff/eclipse/jython-trunk/dist/Lib/test/regrtest.py", line 565, in runtest
     [exec]   File "/home/jeff/eclipse/jython-trunk/dist/Lib/test/regrtest.py", line 580, in runtest_inner
     [exec] test test_tempfile crashed -- <type 'exceptions.IOError'>: [Errno 2] No such file or directory: '/home/jeff/eclipse/jython-trunk/dist/testreports/TEST-test.test_tempfile.test_exports.xml'
     [exec] test_textwrap
     [exec] IOError: [Errno 2] File not found - /home/jeff/eclipse/jython-trunk/dist/Lib/test/junit_xml$py.class (Too many open files)
     [exec] Result: 255

I observed this on Linux Mint 17.1 (which is Ubuntu 14.04).

However, test_tempfile is not really the culprit: it is just here that I finally hit a limit. If I monitor the number of open files as regression testing proceeds, it climbs alarmingly at certain points. I hacked in a system call to lsof (just keeping the last line), so I was able to watch it happen when running:

dist/bin/jython -m test.regrtest -u subprocess,network -e 

These are the major contributors, with the number of file handles they leave open:

        test_docxmlrpc          #  206 leaked handles
        test_httpservers        #  721 leaked handles
        test_imaplib            #   92 leaked handles
        test_socketserver       # 1344 leaked handles
        test_telnetlib          # 1588 leaked handles
        test_timeout            #  123 leaked handles
        test_urllib2_localnet   #  763 leaked handles
        test_xmlrpc             #  453 leaked handles


I'm omitting minor contributors and this took me a couple of goes. On a straight we are up to 4070 handles when test_tempfile comes along, wanting half a hundred open files, and we breach the apparent limit of 4096. Most of the "open files" show in lsof as "pipe" or "anon_inode".

I'm not able to say whether these leaks are due to careless writing in the test, or a leak in the libraries affecting real applications.

I'll include these in the expected failures for java.posix, although they don't fail detectably to regrtest.
msg10426 (view) Author: Jim Baker (zyasoft) Date: 2015-11-01.18:31:30
Jeff, this is very helpful in terms of using lsof to do this assessment. I too have noticed this problem, and it is because of using sockets.

I don't believe it's a fault in the socket module per se, which should be attempting to close out all Netty resources. But a lot of the tests rely on deterministic GC. So that's why test_socket, which previously would always reliably run out of files, uses the following function:

    def _assert_no_pending_threads(self, group, msg):
        # Ensure __del__ finalizers are called on sockets. Two things to note:
        # 1. It takes two collections for finalization to run.
        # 2. gc.collect() is only advisory to the JVM, never mandatory. Still 
        #    it usually seems to happen under light load.

        # Wait up to one second for there not to be pending threads

        for i in xrange(10):
            pending_threads = _check_threadpool_for_pending_threads(group)
            if len(pending_threads) == 0:
                break
            test_support.gc_collect()
            
        if pending_threads:
            print "Pending threads in Netty msg={} pool={}".format(msg, pprint.pformat(pending_threads))
        
Not running test_socket in the usual mix may be losing out on this attempt to do aggressive GC.

My expectation is that most Python code out there does the right thing and closes out resources like sockets. However, since the socket module does not support __enter__, __exit__, and I have code in the wild that does not do such closes, it's quite possible I'm wrongly hoping that!
msg10795 (view) Author: Jim Baker (zyasoft) Date: 2016-02-27.22:38:14
We should add support for finding resource leaks in regrtest. We can take advantage of existing support for this: -l (findleaks), -R (huntlrleaks), which are explicitly disabled for Jython in the current version.

Two things we can do:

1. Check for growing number of open files with lsof on platforms where this is supported, after each test is run. (There may be a Windows equivalent as well). We can also look at adding support for classloader leaks.

2. Check for garbage leaks using the support for gc debugging available as of 2.7.0.

This should be very helpful stabilization for 2.7.2.
History
Date User Action Args
2016-02-27 22:38:15zyasoftsetnosy: + darjus, stefan.richthofer
messages: + msg10795
milestone: Jython 2.7.2
2015-11-01 18:31:32zyasoftsetnosy: + zyasoft
messages: + msg10426
2015-11-01 16:22:44jeff.allencreate