Issue1602

classification

Title:	missing synchronized causes deadlocks in org.python.core.IdImpl
Type:	crash	Severity:	normal
Components:	Core	Versions:	2.5.1, 2.5.0
		Milestone:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	zyasoft	Nosy List:	akruis, zyasoft
Priority:		Keywords:	patch

Created on 2010-04-26.09:25:49 by akruis, last changed 2010-04-27.22:29:40 by zyasoft.

Files
File name	Uploaded	Description	Edit	Remove
IdImpl.diff	akruis, 2010-04-26.09:25:49	Patch for this issue: diff against rev 7046
thread_dumps.txt	akruis, 2010-04-27.07:12:10

Messages
msg5748 (view)	Author: Anselm Kruis (akruis)	Date: 2010-04-26.09:58:42
The current implementation of org.python.core.IdImpl is not fully thread save, because the #java_obj_id(Object) method is not synchronized. IdImpl uses a java.util.HashMap to store ID-values for objects. The API-doc for java.util.HashMap requires to synchronize concurrent access to HashMap. IdImpl.java_obj_id(Object) method is indirectly called by thread.get_ident(). (thread.get_ident() -> org.python.core.Py.java_obj_id(Object) -> org.python.core.IdImpl.java_obj_id(Object) Django uses thread.get_ident() in its transaction management code. I observed two deadlocks on a multiprocessor server, that were apparently caused by the IdImpl#isMap HashMap being corrupted. I can provide the Java thread dumps if anybody is interested. The attached patch fixes this issue.
msg5750 (view)	Author: Jim Baker (zyasoft)	Date: 2010-04-26.15:30:13
Thanks for the simple patch. Could you submit a test case as well?
msg5752 (view)	Author: Anselm Kruis (akruis)	Date: 2010-04-27.07:12:10
Unfortunately, the bug is fairly hard to trigger. I couldn't reproduce it on my personal development system. You probably need: several threads, memory pressure to trigger garbage collection. At least this are the conditions on my pre-production system. I observed the deadlock twice within about 8-millions transactions. I have attached the thread dumps of both cases. Common to both cases is, that the "RUNNABLE" thread consumes 100% CPU, doesn't perform any system calls and does not return from java.util.HashMap.get / java.util.HashMap.removeEntryForKey. Both methods traverse a linked list and this became cyclic.
msg5754 (view)	Author: Jim Baker (zyasoft)	Date: 2010-04-27.22:29:40
Fixed in r7048 Thanks for the debug output. The infinite loop you observed on a hash bucket is perhaps the classical bug now seen in thread safety problems :) Fortunately it's also easy to fix. I'm going to mark this bug closed since the specific bug is basically not possible to recreate with unit tests (at least with our resources) and more importantly it's a well-known problem with a robust solution.

History
Date	User	Action	Args
2010-04-27 22:29:40	zyasoft	set	status: open -> closed resolution: accepted -> fixed messages: + msg5754
2010-04-27 07:12:14	akruis	set	files: + thread_dumps.txt messages: + msg5752
2010-04-26 15:30:26	zyasoft	set	assignee: zyasoft resolution: accepted
2010-04-26 15:30:14	zyasoft	set	nosy: + zyasoft messages: + msg5750
2010-04-26 09:58:42	akruis	set	messages: + msg5748
2010-04-26 09:25:49	akruis	create