Issue1602
Created on 2010-04-26.09:25:49 by akruis, last changed 2010-04-27.22:29:40 by zyasoft.
File name |
Uploaded |
Description |
Edit |
Remove |
IdImpl.diff
|
akruis,
2010-04-26.09:25:49
|
Patch for this issue: diff against rev 7046 |
|
|
thread_dumps.txt
|
akruis,
2010-04-27.07:12:10
|
|
|
|
msg5748 (view) |
Author: Anselm Kruis (akruis) |
Date: 2010-04-26.09:58:42 |
|
The current implementation of org.python.core.IdImpl is not fully thread save, because the #java_obj_id(Object) method is not synchronized. IdImpl uses a java.util.HashMap to store ID-values for objects. The API-doc for java.util.HashMap requires to synchronize concurrent access to HashMap.
IdImpl.java_obj_id(Object) method is indirectly called by thread.get_ident().
(thread.get_ident() -> org.python.core.Py.java_obj_id(Object) -> org.python.core.IdImpl.java_obj_id(Object)
Django uses thread.get_ident() in its transaction management code. I observed two deadlocks on a multiprocessor server, that were apparently caused by the IdImpl#isMap HashMap being corrupted. I can provide the Java thread dumps if anybody is interested.
The attached patch fixes this issue.
|
msg5750 (view) |
Author: Jim Baker (zyasoft) |
Date: 2010-04-26.15:30:13 |
|
Thanks for the simple patch. Could you submit a test case as well?
|
msg5752 (view) |
Author: Anselm Kruis (akruis) |
Date: 2010-04-27.07:12:10 |
|
Unfortunately, the bug is fairly hard to trigger. I couldn't reproduce it on my personal development system. You probably need: several threads, memory pressure to trigger garbage collection. At least this are the conditions on my pre-production system. I observed the deadlock twice within about 8-millions transactions.
I have attached the thread dumps of both cases. Common to both cases is, that the "RUNNABLE" thread consumes 100% CPU, doesn't perform any system calls and does not return from java.util.HashMap.get / java.util.HashMap.removeEntryForKey. Both methods traverse a linked list and this became cyclic.
|
msg5754 (view) |
Author: Jim Baker (zyasoft) |
Date: 2010-04-27.22:29:40 |
|
Fixed in r7048
Thanks for the debug output. The infinite loop you observed on a hash bucket is perhaps the classical bug now seen in thread safety problems :)
Fortunately it's also easy to fix.
I'm going to mark this bug closed since the specific bug is basically not possible to recreate with unit tests (at least with our resources) and more importantly it's a well-known problem with a robust solution.
|
|
Date |
User |
Action |
Args |
2010-04-27 22:29:40 | zyasoft | set | status: open -> closed resolution: accepted -> fixed messages:
+ msg5754 |
2010-04-27 07:12:14 | akruis | set | files:
+ thread_dumps.txt messages:
+ msg5752 |
2010-04-26 15:30:26 | zyasoft | set | assignee: zyasoft resolution: accepted |
2010-04-26 15:30:14 | zyasoft | set | nosy:
+ zyasoft messages:
+ msg5750 |
2010-04-26 09:58:42 | akruis | set | messages:
+ msg5748 |
2010-04-26 09:25:49 | akruis | create | |
|