Issue1291
Created on 2009-03-28.00:47:36 by irmen, last changed 2009-04-19.13:38:06 by amak.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | Remove |
server.py | irmen, 2009-03-28.00:47:35 | server program that crashes | ||
client.py | irmen, 2009-03-28.00:47:52 | client program to run with server | ||
server.java | amak, 2009-04-05.13:24:40 | A server written in pure java that illustrates the bug. | ||
client.java | amak, 2009-04-05.13:25:25 | A client written in pure java that is used to make the server crash. |
Messages | |||
---|---|---|---|
msg4365 (view) | Author: Irmen de Jong (irmen) | Date: 2009-03-28.00:47:35 | |
on a certain use of select(), jython crashes with an IOException: Exception in thread Thread:Traceback (most recent call last): File "F:\jython25b3\Lib\threading.py", line 247, in _Thread__bootstrap self.run() File "F:\jython25b3\Lib\threading.py", line 238, in run self._target(*self._args, **self._kwargs) File "server.py", line 23, in handle rsock,wsock,esock=_selectfunction([csock],[],[],2) # <-- causes the crash File "F:\jython25b3\Lib\select.py", line 213, in cpython_compatible_select return native_select(read_fd_list, write_fd_list, outofband_fd_list, timeout) File "F:\jython25b3\Lib\select.py", line 171, in native_select pobj = poll() File "F:\jython25b3\Lib\select.py", line 63, in __init__ self.selector = java.nio.channels.Selector.open() IOException: java.io.IOException: Unable to establish loopback connection I am using the cpython_compatible_select function from jython's select module. See attached server.py and client.py that show the problem (the server.py will crash after a short while) It seems to be related to using |
|||
msg4366 (view) | Author: Irmen de Jong (irmen) | Date: 2009-03-28.00:49:55 | |
...It seems to be related to using the select inside a thread, or on a client socket instead of a server socket. If you remove the select in the handle() function in the server, it runs fine. |
|||
msg4367 (view) | Author: Alan Kennedy (amak) | Date: 2009-03-28.18:15:43 | |
Hi Irmen, Thanks for the detailed report, and the code that reproduces the bug. I can reliably recreate this bug now. More soon. |
|||
msg4370 (view) | Author: Alan Kennedy (amak) | Date: 2009-03-28.21:38:15 | |
OK, this is not necessarily a problem with our jython implementation: see attached server.java, which is the direct equivalent of the jython server.py that Irmen provided: it generates the same exception, with no jython involved. I'm reasonably sure that this is a Windows specific thing as well, but I need to check on a few different platforms. I think this may be caused by our method using java.nio.channels.Selector, which is really meant to be created once and used for the whole duration of the server. The jython select.select is implemented using Selector, and creates and destroys a Selector every single time the select.select function is called. The best way to work around this problem would be to use our select.poll object, in the way it is intended to be used, i.e. create one for the whole duration of the server. I'm going to report a bug Sun about this, but I'm not optimistic that it will be fixed (or even looked at, going on experience from previous similar java bugs). So we may just have to document this as a known bug, and live with it. I want to do some more experimentation though. |
|||
msg4371 (view) | Author: Irmen de Jong (irmen) | Date: 2009-03-28.22:25:59 | |
Ok in light of this, I will commit the workaround (remove the select on the client socket) in Pyro. I want it to work with Jython now, and not having to wait for a jdk fix ;-) |
|||
msg4372 (view) | Author: Alan Kennedy (amak) | Date: 2009-03-28.23:43:27 | |
This is alomost certainly a resource issue, on windows java. I had a suspicion that this issue is nothing to do with threading, so I rewrote the server to not spawn threads, but to process everything in a single thread. The "single_threaded_server.java" file is attached. I also modified the client to count the number of iterations it went through (counting_client.py attached). The exception occurs at (almost) exactly the same point each time, after between 3945 and 3955 iterations on my Windows Server 2003 box, with the following exception, which seems to indicate that buffer space has been exhausted. Exception: java.io.IOException: Unable to establish loopback connection Exception in thread "main" java.io.IOException: Unable to establish loopback connection at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:106) at java.security.AccessController.doPrivileged(Native Method) at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:122) at sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:27) at java.nio.channels.Pipe.open(Pipe.java:133) at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:106) at sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:26) at java.nio.channels.Selector.open(Selector.java:209) at single_threaded_server.main(single_threaded_server.java:69) Caused by: java.net.SocketException: No buffer space available (maximum connections reached?): connect at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:464) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:78) ... 8 more |
|||
msg4373 (view) | Author: Alan Kennedy (amak) | Date: 2009-03-28.23:44:18 | |
Uploading the counting client. |
|||
msg4374 (view) | Author: Alan Kennedy (amak) | Date: 2009-03-28.23:45:26 | |
I'd be interested in reports of how many iterations before failure on other versions of windows, e.g. vista, XP, 2008, etc. |
|||
msg4375 (view) | Author: Alan Kennedy (amak) | Date: 2009-03-28.23:45:52 | |
Lastly, I'd like to know if anyone can reproduce this issue on any other platform besides windows. |
|||
msg4377 (view) | Author: Irmen de Jong (irmen) | Date: 2009-03-29.00:05:46 | |
Tried the single threaded server and counting client on my Mac (OSX 10.4). Server crashes with java.io.IOException: Too many open files after the client reached about 3400. |
|||
msg4378 (view) | Author: Alan Kennedy (amak) | Date: 2009-03-29.15:59:47 | |
I was surprised by the MacOS result. So I took a closer look at the java code I posted, and realised that there was a glaring omission: the selector's in that server code were never closed. So I updated the code to properly close all the selectors: the new "closing_single_threaded_server.java" is attached. And the code still reliably reproduces the bug. Except that the reported exception is different now. The new code reports "Address already in use: connect"; traceback given below. In an effort to try and find ways to prevent the bug from happening, I also added to two options, as follows 1. Cancel keys. Cancelling a SelectionKey removes it from the "interest set" of a Selector. I added this option to cancel keys to see if it made any difference to the exception. It didn't. 2. Reselect. In the early days of java.nio, there was some community advice to do another select on a Selector before closing it, after cancelling all the keys, to ensure that all SelectionKey cancellation would have taken effect before a close operation was attempted. (this was never official advice from Sun however). See the javadocs for more information about closing selectors. http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/Selector.html But neither of the options makes any difference. Cancelling the keys makes no difference. Reselecting makes no difference. The IOException still occurs in all cases. As before, I'd be interested to see the behaviour of the "closing_single_threaded_server" on different platforms. Here is the new traceback. Exception in thread "main" java.io.IOException: Unable to establish loopback connection at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:106) at java.security.AccessController.doPrivileged(Native Method) at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:122) at sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:27) at java.nio.channels.Pipe.open(Pipe.java:133) at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:106) at sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:26) at java.nio.channels.Selector.open(Selector.java:209) at closing_single_threaded_server$handler.process(closing_single_threaded_server.java:40) at closing_single_threaded_server.main(closing_single_threaded_server.java:90) Caused by: java.net.BindException: Address already in use: connect at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:464) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:78) ... 9 more |
|||
msg4431 (view) | Author: Jim Baker (zyasoft) | Date: 2009-04-04.02:52:37 | |
This *may* be a blocker for RC, so marking as high |
|||
msg4438 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-04.15:22:10 | |
This looks very likely to be a java bug, on windows. If it is, then we can't fix it. So we can't let it hold up the 2.5 release; we can only report it to Sun, and document it as a "known issue", and recommend practices to workaround it. I need to investigate if it occurs on various JVMs versions. If it is fixed on Java 6, for example, then we can at least recommend to users who encounter the bug to use the bug free one. |
|||
msg4440 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-05.13:24:40 | |
Uploading pure java version of the server. |
|||
msg4441 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-05.13:25:25 | |
Uploading java version of the client. |
|||
msg4442 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-05.13:32:36 | |
Using the attached pure java server and client, I can reliably reproduce the bug on JDK 1.5.0_17, on my Windows Server 2003 box. After approx 8000 iterations of the client, the server crashes with the following exception. Exception in thread "main" java.io.IOException: Unable to establish loopback connection at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:106) at java.security.AccessController.doPrivileged(Native Method) at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:122) at sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:27) at java.nio.channels.Pipe.open(Pipe.java:133) at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:106) at sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:26) at java.nio.channels.Selector.open(Selector.java:209) at server$handler.process(server.java:42) at server.main(server.java:91) Caused by: java.net.BindException: Address already in use: connect at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:464) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:78) ... 9 more After the crash, when I issue the command "netstat -ano", I get thousands of sockets, in the state "TIME_WAIT", and pid of 0: Proto Local Address Foreign Address State PID TCP 127.0.0.1:1040 127.0.0.1:1052 TIME_WAIT 0 TCP 127.0.0.1:1042 127.0.0.1:1044 TIME_WAIT 0 TCP 127.0.0.1:1044 127.0.0.1:1050 TIME_WAIT 0 TCP 127.0.0.1:1047 127.0.0.1:1050 TIME_WAIT 0 TCP 127.0.0.1:1048 127.0.0.1:1050 TIME_WAIT 0 TCP 127.0.0.1:1050 127.0.0.1:1052 TIME_WAIT 0 TCP 127.0.0.1:1051 127.0.0.1:1052 TIME_WAIT 0 TCP 127.0.0.1:1052 127.0.0.1:1076 TIME_WAIT 0 TCP 127.0.0.1:1055 127.0.0.1:1059 TIME_WAIT 0 TCP 127.0.0.1:1057 127.0.0.1:1059 TIME_WAIT 0 |
|||
msg4443 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-05.13:35:52 | |
Using the attached server and client, I can reliably reproduce the bug on jdk 1.6.0_13. The symptoms are exactly the same as on JDK 1.5, an "Unable to establish loopback connection", caused by "Address already in use: connect". Again, a huge list of sockets in the TIME_WAIT state with a PID of 0 is listed by "netstat -ano". |
|||
msg4445 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-05.13:58:10 | |
OK, I've reported this bug in java to Sun. Though I'm not hopeful; they completely ignored my bug report the last time I submitted one, with full source code to reliably reproduce. http://bugs.jython.org/issue1018 |
|||
msg4446 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-05.14:39:45 | |
I have documented this on the socket module wiki as a "known issue". http://wiki.python.org/jython/NewSocketModule#Multipleuseofselect.selectresultinanExceptiononWindows |
|||
msg4449 (view) | Author: Giampaolo Rodola' (billiejoex) | Date: 2009-04-05.19:07:58 | |
> The best workaround in this case > is to use select.poll() objects > instead of the select.select() function. This is incorrect since poll() doesn't exist on Windows. |
|||
msg4450 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-05.20:18:18 | |
[Giampaolo] > This is incorrect since poll() doesn't exist on Windows. Perhaps the poster has misunderstood. java.nio.channels.Selector is the only class available on the JVM that permits multiplexing sockets. java.nio.channels.Selector, the java object, is currently defective on Windows, because of the bug illustrated. Jython software using this class will be similarly affected on Windows, i.e. the jython select module. There may be a deeper relationship between the facts that A. java.nio.channels.Selector has problems on Windows B. Cpython's select module on Windows does not have a poll object. But in terms of jython implementation, we're restricted to java classes, unless we want to go to JNI, which could a very complex undertaking, that the cpython folks avoided. I'm beginning to regret supporting the select.select function. It's an unnatural way to use the java APIs. It seem like a neat solution to the problem at the time, but I now think goes against the grain of the intention of the java APIs, especially given the common use cases that are appearing from cpython. I think now perhaps that it would have been better to simply support select.poll objects, to make it clear to the user that select semantics on jython are different from cpython, and have to be thought of on java terms. |
|||
msg4451 (view) | Author: Giampaolo Rodola' (billiejoex) | Date: 2009-04-05.23:14:54 | |
> There may be a deeper relationship between the facts that > A. java.nio.channels.Selector has problems on Windows > B. Cpython's select module on Windows does not have a poll object. It's not that cPython avoided to add Windows poll() support. poll() system call is *UNIX only*. > I'm beginning to regret supporting the select.select function. Having select() support is good for Windows users since it's often the only chance they have for doing socket multiplexing. |
|||
msg4454 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-06.09:33:42 | |
[Giampaolo] > It's not that cPython avoided to add Windows poll() support. > poll() system call is *UNIX only*. The poster is confused. The statement "poll() system call is *UNIX only*" applies to cpython, *not* to jython. Jython *has* a poll object on Windows, and it works. [Giampaolo] > Having select() support is good for Windows users since it's often the > only chance they have for doing socket multiplexing. The poster has not been reading what I've been writing. The statement above applies only to cpython, not to jython. By necessity, jython must use java.nio.channels.Selector to implement the select function. But this usage results in an "unnatural" usage of the java API, and gives rise to the problem being discussed in this thread. I refer you to the code for the actual implementation of the jython select module. http://fisheye3.atlassian.com/browse/~raw,r=trunk/jython/trunk/jython/Lib/select.py If the poster has better implementation options, patches are welcome. Further discussion based on confusing cpython with jython will serve no purpose. http://www.jython.org/Project/userfaq.html |
|||
msg4485 (view) | Author: Kevin Fitch (kfitch42) | Date: 2009-04-08.12:35:41 | |
From what I can tell, the underlying problem here is that there is an implicit leak in the java Selector object related to lookback connections, that it uses for whatever reason. So perhaps the best workaround would be to introduce our own 'leak' that will cover up the java one :) What if we had a poll pool (say that 10 times fast :)? The naive implementation would make sure we never created more poll objects (and hence selectors) than the maximum number of simultaneous calls to select, instead of creating as many as there are calls to select. The only outstanding question would be: when do we clean the pool? Perhaps it would be good enough to just let things accumulate. In 'normal' applications (at least in my definition of normal) we would only end up with a small handful of poll objects that live forever in the pool. A quick and dirty pool implementation (I haven't run/tested this yet): class _selectPollPool(): _pollPool=[] _mtx = threading.mutex() @staticmethod def getPollObject(): with _mtx: if _pollPool: return _pollPool.pop() else: return poll() def putPollObject(pobj) pobj.close() with _mtx: _pollPool.append(pobj) We could then just replace the 'pobj = poll()' line in native_select with 'pobj = _selectPollPool.getPollObject()' and the pobj.close() in the finally block with _selectPollPool.putPollObject(pobj) |
|||
msg4490 (view) | Author: Kevin Fitch (kfitch42) | Date: 2009-04-08.20:27:50 | |
I have been unable to reproduce this bug under Linux (using the java or the jython code). It is definitely Windows (and may OSX) specific. |
|||
msg4491 (view) | Author: Giampaolo Rodola' (billiejoex) | Date: 2009-04-09.02:13:40 | |
Sorry for my lack of Jython experience. As far as I can tell I've just tried to run pyftpdlib [1] test suite by using select.poll() instead of select.select() but I get the same "Unable to establish loopback connection" error (Windows XP SP3, Jython 2.5b3). |
|||
msg4495 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-09.12:16:17 | |
[Kevin] > From what I can tell, the underlying problem here is that there is an > implicit leak in the java Selector object related to lookback > connections, that it uses for whatever reason. So perhaps the best > workaround would be to introduce our own 'leak' that will cover up the > java one :) > > What if we had a poll pool (say that 10 times fast :)? The naive > implementation would make sure we never created more poll objects (and > hence selectors) than the maximum number of simultaneous calls to > select, instead of creating as many as there are calls to select. That's a (great idea)*10 :-) I'm not 100% convinced it will work, because we won't be able to close() the poll objects in the pool, which might create problems with outstanding references to SocketChannels. This is because they cannot be re-opened: there is no open() method on Selectors: they can only be created with the static factory method java.nio.channels.Selector.open(), which returns them already opened. http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/Selector.html#open() But it's the best idea proposed so far to solve this problem. I'm working on a patch now; more later today. |
|||
msg4497 (view) | Author: Kevin Fitch (kfitch42) | Date: 2009-04-09.19:15:30 | |
I can't seem to reproduce this. I tried out the jython (using 2.5b3) version on Win XP Pro SP3 (32-bit) with Sun java 1.6.0_04 I gave up when it got to around 250000 iterations. The java one is still running, but is at 69000 right now, with no probs. |
|||
msg4500 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-11.11:15:31 | |
I've checked in an implementation of a cache for poll objects, so that poll objects/Selectors are re-used as much as possible. It is currently only used on MS Windows. It seems stable enough. I ran 6 of the counting clients against it yesterday for a few hours; they accumulated over 7,000,000 iterations between them, with no problems at all. So, please can everyone who has been having problems with this bug try out the new code, which was checked in at r6208. http://fisheye3.atlassian.com/browse/jython/trunk/jython/Lib/select.py |
|||
msg4528 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-17.16:28:30 | |
Ping. Has anyone been able to reproduce the bug with the new patched version of the select module? If not, I'll consider this bug fixed. |
|||
msg4530 (view) | Author: Giampaolo Rodola' (billiejoex) | Date: 2009-04-17.17:39:25 | |
I tried pyftpdlib test suite on Windows by using both select() and poll() and I don't get any "IOException: java.io.IOException:" exception now. |
|||
msg4540 (view) | Author: Alan Kennedy (amak) | Date: 2009-04-19.13:38:06 | |
Two successful reports that this fix as eliminated the exception, one on this bug (from billiejoex) and one from kfitch42 on this bug http://bugs.jython.org/issue1304 Therefore, closing this bug. Thanks to Kevin Fitch for the suggestion of the eventual solution. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2009-04-19 13:38:06 | amak | set | status: open -> closed resolution: fixed messages: + msg4540 |
2009-04-17 17:39:25 | billiejoex | set | messages: + msg4530 |
2009-04-17 16:28:30 | amak | set | messages: + msg4528 |
2009-04-11 11:15:32 | amak | set | messages: + msg4500 |
2009-04-09 19:15:31 | kfitch42 | set | messages: + msg4497 |
2009-04-09 12:16:19 | amak | set | messages: + msg4495 |
2009-04-09 02:13:41 | billiejoex | set | messages: + msg4491 |
2009-04-08 20:27:51 | kfitch42 | set | messages: + msg4490 |
2009-04-08 12:35:41 | kfitch42 | set | messages: + msg4485 |
2009-04-07 21:42:18 | kfitch42 | set | nosy: + kfitch42 |
2009-04-06 09:33:43 | amak | set | messages: + msg4454 |
2009-04-05 23:14:55 | billiejoex | set | messages: + msg4451 |
2009-04-05 20:18:19 | amak | set | messages: + msg4450 |
2009-04-05 19:07:59 | billiejoex | set | messages: + msg4449 |
2009-04-05 14:39:45 | amak | set | messages: + msg4446 |
2009-04-05 13:58:10 | amak | set | messages: + msg4445 |
2009-04-05 13:35:52 | amak | set | messages: + msg4443 |
2009-04-05 13:32:37 | amak | set | messages: + msg4442 |
2009-04-05 13:25:25 | amak | set | files:
+ client.java messages: + msg4441 |
2009-04-05 13:24:41 | amak | set | files:
+ server.java messages: + msg4440 |
2009-04-05 13:22:22 | amak | set | files: - closing_single_threaded_server.java |
2009-04-05 13:22:16 | amak | set | files: - counting_client.py |
2009-04-05 13:22:10 | amak | set | files: - single_threaded_server.java |
2009-04-05 13:22:04 | amak | set | files: - server.java |
2009-04-04 15:22:10 | amak | set | messages: + msg4438 |
2009-04-04 02:52:37 | zyasoft | set | priority: high nosy: + zyasoft messages: + msg4431 |
2009-04-03 16:38:41 | billiejoex | set | nosy: + billiejoex |
2009-03-29 15:59:48 | amak | set | files:
+ closing_single_threaded_server.java messages: + msg4378 |
2009-03-29 00:05:46 | irmen | set | messages: + msg4377 |
2009-03-28 23:45:52 | amak | set | messages: + msg4375 |
2009-03-28 23:45:26 | amak | set | messages: + msg4374 |
2009-03-28 23:44:18 | amak | set | files:
+ counting_client.py messages: + msg4373 |
2009-03-28 23:43:29 | amak | set | files:
+ single_threaded_server.java messages: + msg4372 |
2009-03-28 22:25:59 | irmen | set | messages: + msg4371 |
2009-03-28 21:38:17 | amak | set | files:
+ server.java messages: + msg4370 |
2009-03-28 18:15:44 | amak | set | assignee: amak messages: + msg4367 nosy: + amak |
2009-03-28 00:49:56 | irmen | set | messages: + msg4366 |
2009-03-28 00:47:52 | irmen | set | files: + client.py |
2009-03-28 00:47:36 | irmen | create |
Supported by Python Software Foundation,
Powered by Roundup