Issue1974

classification
Title: Threading broken in jython 2.5.3
Type: Severity: normal
Components: Core Versions: Jython 2.5
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: fwierzbicki Nosy List: fwierzbicki, mishikal@yahoo.com
Priority: normal Keywords:

Created on 2012-10-02.17:09:22 by mishikal@yahoo.com, last changed 2013-02-19.23:13:30 by fwierzbicki.

Messages
msg7469 (view) Author: Quanah Gibson-Mount (mishikal@yahoo.com) Date: 2012-10-02.17:09:21
I recently upgraded from jython 2.5.2 to jython 2.5.3 standalone jars. 
After this upgrade, we found that threading is broken in 2.5.3, when a 
process tries to join an existing thread.

Logs from 2.5.2:

Oct  1 15:18:27 zre-ldap002 zmconfigd[28096]: Found 3 threads
Oct  1 15:18:27 zre-ldap002 zmconfigd[28096]: Active Thread listener found
Oct  1 15:18:27 zre-ldap002 zmconfigd[28096]: Active Thread Thread found
Oct  1 15:18:27 zre-ldap002 zmconfigd[28096]: Active Thread MainThread found
Oct  1 15:18:27 zre-ldap002 zmconfigd[28096]: Fetching All configs

Logs from 2.5.3:

Oct  1 15:04:06 zre-ldap002 zmconfigd[23938]: Found 3 threads
Oct  1 15:04:06 zre-ldap002 zmconfigd[23938]: Active Thread listener found
Oct  1 15:04:06 zre-ldap002 zmconfigd[23938]: Active Thread Thread-8329 
found
Oct  1 15:04:06 zre-ldap002 zmconfigd[23938]: Attempting to join() 
Thread-8329
Oct  1 15:04:11 zre-ldap002 zmconfigd[23938]: join() Thread-8329 FAILED
Oct  1 15:04:11 zre-ldap002 zmconfigd[23938]: Hung threads detected (3 
total), exiting


This is infinitely repeatable.


Our loop has:

while True:

        Log.logMsg (4, "Found %d threads" % threading.activeCount())

        for th in threading.enumerate():
                Log.logMsg(4, "Active Thread %s found" % th.getName())
                if (th.getName() != "listener" and th.getName() != "MainThread" and th.getName() != "Thread" and th.getName != "SIGUSR2 handler"):
                        Log.logMsg(4, "Attempting to join() %s" % th.getName())
                        th.join(5);
                        if (th.isAlive()):
                                Log.logMsg(1, "join() %s FAILED" % th.getName())
                                Log.logMsg(1, "Hung threads detected (%d total), exiting" % threading.activeCount());
                                sys.exit(1)

        t1 = time.clock()

        try:
                # read all the configs
                myState.getAllConfigs(myConfig)
        except Exception, e:
                [Log.logMsg(1,t) for t in traceback.format_tb(sys.exc_info()[2])]
                if myState.forced:
                        Log.logMsg(0, "Key lookup failed.")
                Log.logMsg(1, "Sleeping...Key lookup failed (%s)" % (e,))
                time.sleep(60)
                continue

        try:
                # read zmconfigd config
                myState.getMtaConfig(myConfig.configFile)

                # watchdog restarts apps if they are not running
                watchdog()

                # check for config changes
                myState.compareKeys()

                Log.logMsg (5, "LOCK myState.lAction requested")
                myState.lAction.acquire()
                Log.logMsg (5, "LOCK myState.lAction acquired")
                myState.compileActions()
                myState.requestedconfig = {}
                myState.doConfigRewrites()
                myState.lAction.notifyAll()
                myState.lAction.release()
                Log.logMsg (5, "LOCK myState.lAction released")

                # executes rewrites/postconf/restarts
                if myConfig.restartconfig:
                        myState.doRestarts()
        except Exception, e:
                [Log.logMsg(1,t) for t in traceback.format_tb(sys.exc_info()[2])]
                if myState.forced and myState.forced < 100:
                        Log.logMsg(0, "Configuration inconsistency detected (%s)" % (e,))
                Log.logMsg(1, "Sleeping...Configuration inconsistency detected (%s)" % (e,));
                time.sleep(60)
                continue

        if myState.forced:
                break

        # start the listener after we have the lock, or an early request can cause problems
        # start the listener after the rewrites, so the start script doesn't return before they're complete
        if myState.firstRun and not myState.forced:
                requestListener()

        Log.logMsg (5, "LOCK myState.lAction released")
        myState.firstRun = False
        lt = time.clock()-t1
        Log.logMsg(4, "Loop completed in %.2f seconds" % (lt,));
        Log.logMsg(4, "Sleeping for %d." % (myConfig.interval,));

        # Jython won't wake up from time.sleep() when a signal is received and caught.
        # Uncaught signals seem to cause the JVM to exit
        myState.sleepTimer = myConfig.interval
        Log.logMsg (5, "Sleeping for %d" % myState.sleepTimer)
        while myState.sleepTimer > 0:
                time.sleep(sleepinterval)
                myState.sleepTimer -= sleepinterval
        Log.logMsg (5, "Waking up")

dt = time.clock()-t0
Log.logMsg(5, "%s completed in %.2f seconds" % (myConfig.progname,dt));
sys.exit(0)
msg7620 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-02-05.18:46:28
Hi,

I can't currently reproduce this since your "Log" object and "myState" (and maybe others) are not defined in this script. If you could supply the missing parts I would have a better chance of reproducing your issue and looking into it.
msg7625 (view) Author: Quanah Gibson-Mount (mishikal@yahoo.com) Date: 2013-02-05.23:11:06
You can get the bits we import from:
https://github.com/Zimbra-Community/zimbra-sources/tree/master/main/ZimbraServer/src/python/pylibs

And a current copy of zmconfigd from:
https://github.com/Zimbra-Community/zimbra-sources/blob/master/main/ZimbraServer/src/libexec/zmconfigd
msg7665 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-02-11.19:57:36
Hi Quanah,

Debugging threading issues with large dependencies is a bit tough, is there any way that you could narrow the bug down to a self contained smallish code sample so we can narrow this down?
History
Date User Action Args
2013-02-19 23:13:30fwierzbickisetresolution: remind
versions: + Jython 2.5, - 2.5.3b2
2013-02-11 19:57:36fwierzbickisetpriority: normal
assignee: fwierzbicki
messages: + msg7665
2013-02-05 23:11:06mishikal@yahoo.comsetmessages: + msg7625
2013-02-05 18:46:28fwierzbickisetnosy: + fwierzbicki
messages: + msg7620
2012-10-02 17:09:22mishikal@yahoo.comcreate