Issue2157

classification
Title: socket blocking/nonblocking/timeout issue
Type: Severity: normal
Components: Library Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: mb_, zyasoft
Priority: Keywords:

Created on 2014-06-01.21:04:46 by mb_, last changed 2014-10-21.19:54:43 by mb_.

Files
File name Uploaded Description Edit Remove
test-results.txt mb_, 2014-06-19.10:03:33
Messages
msg8589 (view) Author: Michael Büsch (mb_) Date: 2014-06-01.21:04:45
My application switches between blocking and nonblocking mode of a socket with the socket.settimeout() method. The documentation says this:
> Setting a timeout of None disables
> the timeout feature and is equivalent to setblocking(1).
> Setting a timeout of zero is the same as setblocking(0).

The application frequently switches between timeout=any_value, timeout=0 and timeout=None while it is running.

So I do not call setblocking().

This works on CPython 2.7.
However on Jython it does not. It results in various weird timeout issues.
As a workaround, I added a setblocking(True) call just before the first settimeout(None) call at application startup. This fixes the issue. But according to the documentation, it seems redundant.

I was unable to cook up a simple test script, as I'm not completely sure where the root of the problem is, yet.
msg8687 (view) Author: Jim Baker (zyasoft) Date: 2014-06-19.00:23:33
Michael, your application seems to be a good test for this support in changing the timeout/blocking, which was only partially supported in Jython before (and not for SSL).

I agree about the apparent redundancy here. Reviewing the code for _realsocket in the _socket module in Jython, timeout is set to _defaulttimeout, which by default is None. The implementation of settimeout will for a true value set the timeout to None.

The only thing I can think of that could possibly explain this is if you are assuming thread safety. Apparently CPython does claim thread safety (http://bugs.python.org/issue1544279), if not in the docs, but there are some obvious places where this isn't the case in Jython's implementation.

I should point out that this thread safety claim doesn't seem to be actually true; see for example of the implementation of makefile and the implementation of _fileobject, which we share with one minor modification with CPython.
msg8748 (view) Author: Michael Büsch (mb_) Date: 2014-06-19.10:03:33
I'm completely unsure what's really going on there.
But what happens is that this commit is a workaround for my issue:
  http://bues.ch/gitweb?p=awlsim.git;a=commitdiff;h=efaac33f891acaaab3c393dc40e60d29824a144e
It changes the startup code of the application to initially do setblocking(True), even if we do settimeout(None) right after that and change the timeout all the time while running anyway.

So there must be some weird difference here. But I'm not able to cook up a standalone test for this. So this certainly is affected by other things that I don't know.
But threading isn't it. The application does not use threads. It runs two interpreter processes (via subprocess) and communicated between them using TCP sockets.

Attached it a log of my unit tests with the workaround applied and reverted.
You can try to reproduce, if desired. Just get the project with
  git clone git://git.bues.ch/awlsim.git
and run this, to run the failing unit test:
  ./tests/run.sh -i /PATH/TO/jython/dist/bin/jython ./tests/coreserver-cli.sh

For me this is 100% reproducible.

(Note that after the test was executed, you'll probably have to kill the server process manually. That's another bug related to subprocess not having terminate() in jython. If you don't kill it, it will error out with address-already-in-use)
msg8750 (view) Author: Jim Baker (zyasoft) Date: 2014-06-19.14:35:06
subprocess termination will be supported, so that will help on that issue: #1898
msg8949 (view) Author: Jim Baker (zyasoft) Date: 2014-09-05.16:14:13
Note: subprocess termination has been supported for a couple of months now
msg8958 (view) Author: Michael Büsch (mb_) Date: 2014-09-07.08:09:31
Calling terminate() still doesn't terminate the process.
What does terminate do internally? Does it send SIGTERM?
msg8969 (view) Author: Jim Baker (zyasoft) Date: 2014-09-08.01:32:21
We use http://docs.oracle.com/javase/7/docs/api/java/lang/Process.html#destroy()

This is better documented here:
http://docs.oracle.com/javase/8/docs/api/java/lang/Process.html#destroy--

My understanding is that for Unix-like systems, destroy uses SIGTERM and destroyForcibly (available in Java 8, not yet used by Jython although we should use if available) uses SIGKILL.
msg9088 (view) Author: Jim Baker (zyasoft) Date: 2014-10-06.03:10:26
First, this test only runs on Linux. But it now runs successfully against Jython trunk:

$ ./tests/run.sh -i ~/jythondev/jython27/dist/bin/jython ./tests/coreserver-cli.sh
=== Running tests with '/home/jbaker/jythondev/jython27/dist/bin/jython'
Running test 'coreserver-cli.sh' with 'jython' ...
--- Running coreserver tests
Running test 'shutdown.awl' with 'jython' ... [OK: 0:19.32]
--- Finished coreserver tests [OK]

All tests succeeded (with interpreter '/home/jbaker/jythondev/jython27/dist/bin/jython')

I assume we are now in good shape with the most recent changes in process support and socket lifecycle, so please tell if this is not the case.
msg9162 (view) Author: Michael Büsch (mb_) Date: 2014-10-21.19:50:57
I'm very sorry for my (too) late response.

The unit test only succeeds for me, because the code contains a workaround for this issue. See lines 673/674:
http://bues.ch/gitweb?p=awlsim.git;a=blob;f=awlsim/coreserver/messages.py;h=9d12d2dcbd001c8900faa7290981b955548c682b;hb=HEAD#l673

If I comment these two lines, the test starts to fail.

> === Running tests with '/home/mb/develop/hg/jython/dist/bin/jython'
> Running test 'coreserver-cli.sh' with 'jython' ... 
> --- Running coreserver tests
> Running test 'shutdown.awl' with 'jython' ... Traceback (most recent call last):
>   File "/home/mb/develop/git/awlsim/./tests/../awlsim-cli", line 523, in <module>
>     sys.exit(main())
>   File "/home/mb/develop/git/awlsim/./tests/../awlsim-cli", line 519, in main
>     return runWithServerBackend(inputFile)
>   File "/home/mb/develop/git/awlsim/./tests/../awlsim-cli", line 276, in runWithServerBackend
>     client.connectToServer()
>   File "/home/mb/develop/git/awlsim/./tests/../awlsim-cli", line 276, in runWithServerBackend
>     client.connectToServer()
>   File "/home/mb/develop/git/awlsim/awlsim/coreclient/client.py", line 142, in connectToServer
>     msg = self.transceiver.receive(timeout = 5.0)
>   File "/home/mb/develop/git/awlsim/awlsim/coreclient/client.py", line 142, in connectToServer
>     msg = self.transceiver.receive(timeout = 5.0)
>   File "/home/mb/develop/git/awlsim/awlsim/coreserver/messages.py", line 725, in receive
>     data = self.sock.recv(hdrLen - len(self.buf))
>   File "/home/mb/develop/hg/jython/dist/Lib/_socket.py", line 1134, in recv
>     data, _ = self._get_message(bufsize, "recv")
>   File "/home/mb/develop/hg/jython/dist/Lib/_socket.py", line 344, in handle_exception
>     return method_or_function(*args, **kwargs)
>   File "/home/mb/develop/hg/jython/dist/Lib/_socket.py", line 344, in handle_exception
>     return method_or_function(*args, **kwargs)
>   File "/home/mb/develop/hg/jython/dist/Lib/_socket.py", line 1100, in _get_message
>     msg = self._get_incoming_msg(reason)
>   File "/home/mb/develop/hg/jython/dist/Lib/_socket.py", line 1081, in _get_incoming_msg
>     self.incoming_head = self._handle_timeout(self.incoming.poll, reason)
>   File "/home/mb/develop/hg/jython/dist/Lib/_socket.py", line 786, in _handle_timeout
>     raise timeout("timed out")
> _socket.timeout: timed out
> === TEST FAILED ===
> Test 'shutdown.awl' FAILED
> Test failed with error code 1
msg9163 (view) Author: Michael Büsch (mb_) Date: 2014-10-21.19:54:43
My installed java is:

$ java -showversion
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.2) (7u65-2.5.2-4.1)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

(Debian Sid)

Just in case that matters.
History
Date User Action Args
2014-10-21 19:54:43mb_setmessages: + msg9163
2014-10-21 19:50:58mb_setmessages: + msg9162
2014-10-21 19:20:21zyasoftsetstatus: pending -> closed
2014-10-06 03:10:26zyasoftsetstatus: open -> pending
resolution: fixed
messages: + msg9088
2014-09-08 01:32:22zyasoftsetmessages: + msg8969
2014-09-07 08:09:31mb_setmessages: + msg8958
2014-09-05 16:14:13zyasoftsetmessages: + msg8949
2014-06-19 14:35:06zyasoftsetmessages: + msg8750
2014-06-19 10:03:34mb_setfiles: + test-results.txt
messages: + msg8748
2014-06-19 00:23:34zyasoftsetnosy: + zyasoft
messages: + msg8687
2014-06-01 21:04:46mb_create