Issue2834

classification
Title: import of Java classes is not thread safe
Type: crash Severity: critical
Components: Core Versions: Jython 2.7.2
Milestone: Jython 2.7.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: FraOrolo, jeff.allen, zyasoft
Priority: high Keywords:

Created on 2019-11-20.12:55:49 by FraOrolo, last changed 2020-02-01.09:00:18 by jeff.allen.

Messages
msg12780 (view) Author: Martin Ginkel (FraOrolo) Date: 2019-11-20.12:55:49
We have short Jython scripts executed on multiple threads embedded in Java. Some of these import all methods from a Java class:
from x.y.z.AJavaClass import *

in multi-threaded executions this often fails because the PyJavaType for the class is still built on one Thread, while some other thread already accesses the PyJavaType and does not find necessary methods.

This is a regression from Jython 2.7.1, the scripts worked reliably on the old system.
msg12792 (view) Author: Jeff Allen (jeff.allen) Date: 2019-11-25.00:22:37
That's a surprise as something like that manifested in 2.7.1, but was fixed in #2487. Or was it?
msg12804 (view) Author: Jim Baker (zyasoft) Date: 2019-11-26.16:55:50
It is surprising given that

1) This support in PyType/PyJavaType was rewritten in 2.7.2 (after being also rewritten in 2.7.1) to better support multithreading by safely publishing the proxy type to Java in its mapping, including inner classes. See https://bugs.jython.org/issue2609

2) imports in general should always sequence behind the module import lock (but that lock is per sys/PySystemState, so if the scripts are being run in separate PySystemState objects, we can discount).

@FraOrolo, could you submit a test script showing this behavior?

Having said that, importing methods from a class like `from x.y.z.AJavaClass import *` could very well be undertested, since that's not so common at least for me (!) vs dispatching from the object (unless we are talking about only static methods). Again, a test script would be very helpful.
msg12810 (view) Author: Martin Ginkel (FraOrolo) Date: 2019-11-27.09:01:12
Ok, after looking after the other changes I can state the following details:

* We had problems with PySystemState before, we use multiple and for 2.7.1 It worked best to use PySystemState from a ThreadLocal. So every thread with a script has his own state. 
The Python also uses a separated ClassLoader and Plugin Architecture, therefore we create multiple PySystemStates. 

* If I understand the comments correctly, multiple PySystemState is the wrong way. I can change and test if it works with just one.

* The import * is indeed for a bunch of static Utility-Methods that 
are provided by the Java class.
msg12828 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-13.07:54:20
I'm moderately confident of the type system now. Less so of PySystemState, sys and the import mechanism, all in relation to concurrency.

I think multiple system states should be a legitimate thing to do. We may not have the right model.
msg12834 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-14.14:02:42
If it is rather to do with import, #2642 may be another piece of the puzzle.
msg12850 (view) Author: Martin Ginkel (FraOrolo) Date: 2019-12-17.11:20:25
Ok, had some time for experiments:
* I tried to use less PySystemStates, this makes it worse. One thread with Jython needs a separate SystemState
* Im still not sure how the synchronization in PyType.Registry.classToType.computeValue() can be missed.
  I see a problem in the fact, that there is no control for insertion into the classValue. I'm feeling unsure about the fact that multiple threads will re-create the PyType wrapping, because they are waiting in front of the lock and do not check if the first import and wrapping was successful.
But in principle this should be fine, just more PyJavaType wrappings are created.
* Interesting for me is: our Java Utitility classes are not referenced by other Java classes that would make them prone to concurrent recursive JavaType wrapping. The class that errors is not referenced by any other Java class.
msg12898 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-25.14:32:31
I have also had some time to experiment (between presents and lunch).

I am not able to create a failure with my test program that imports * from javax.swing.Utilities (a class we won't already have imported, providing static methods as in the OPs case). I've written a unit test where we try to import to an interpreter in a Java Thread (or rather, lots of them). It would be good to have some concurrency unit tests as standard.

I currently suspect logic in the vicinity of PyJavaPackage. I notice that when found by the JavaImporter, the __dict__ of a PyJavaPackage contains entries only for __name__ and further packages. The classes and their attributes seem to get filled in later.

Finding the package happens under the import lock, and the import lock is taken again when filling in (I think), but I'm not yet sure the package can't escape between times. Does the import lock (per system state) anyway really protect objects that come (I think) from a shared structure?

It might be another case of assuming objects need to be synchronised only when modifying them.
msg12900 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-25.14:37:04
I mean javax.swing.text.Utilities (https://docs.oracle.com/javase/8/docs/api/javax/swing/text/Utilities.html).
msg12902 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-26.12:24:31
I have been playing with competing threads, guiding them step by step to see if I can make one trip up the other. In the process, I decided to check my understanding of ClassValue. It is as I thought, although I've never watched it work before.

Martin wrote
"""
I see a problem in the fact, that there is no control for insertion into the classValue. I'm feeling unsure about the fact that multiple threads will re-create the PyType wrapping, because they are waiting in front of the lock and do not check if the first import and wrapping was successful.
But in principle this should be fine, just more PyJavaType wrappings are created.
"""

Recall I am importing javax.swing.text.Utilities, and so Jython will make a PyJavaType to represent its class.

I let Thread-1 call PyType.fromClass() and stopped it in PyType$Registry.resolveType(). This means it has come through ClassValue.get(), missed in the cache, and called computeValue(). I then let Thread-3 do the same, and observed it blocked in computeValue(). I let Thread-1 return through ClassValue.get(), which means the cache in the ClassValue is filled for Utilities. This is too late for Thread-3, which proceeds to construct a second PyJavaType to represent Utilities. On the return journey through ClassValue, Thread-3 fails to insert its result in the cache, and is forced to repeat. This time, it picks up PyJavaType that Thread-1 inserted, so both refer to the same type object, with same Java object id: marvellous!

Thread-3 did work and it became garbage immediately, but this is the price of the non-blocking mechanism we have in ClassValue. The creation of type objects must, however, be idempotent as far as the rest of the process state is concerned. (I think it is.)

Now, let's see if I can trick them into a different error ...
msg12904 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-26.16:54:31
There is a race in org.python.core.PyJavaPackage.addClass(String, Class<?>) where the __dict__ may be updated by threads that are importing into two different PySystemState. The distinct import locks do not stop them running at the same time, but here they are updating (in my example) the PyJavaPackage representing javax.swing.text, in order to add an entry for Utilities that will map to the PyJavaType that represents it.

This does not seem very serious, since both threads (Thread-1 and Thread-3 in my example) are inserting the same object. Of course, it's still wrong.

In the case where each PySystemState has its own class loader and the class in question (a user-defined one, I suppose) enters through a distinct class loader in each thread, these are different classes and will have distinct PyJavaType objects. Unless the (Java) package managers also segregate their hierarchies by loader, there seems a risk here of mixing entries in sys.modules. I'll see if I can make that happen.
msg12908 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-27.18:45:45
I built myself a unit test in which each runner (thread) has its own sys instance and sys.classLoader. My Python script begins "from thin.air.Foo import *", which is a Java class that exists only as a String in the test, and gets compiled into a byte array only my special class loader class can reach. Each sys.classLoader is an insrtance of that class.

This was fun to work out, but that's another story.

In this arrangement, if you ask for thin.air.Foo via the class loader of a particular sys, you should find (as I understand it) that each gives you a distinct java.lang.Class object, and a distinct PyType must follow. This does not happen.

If I chase two threads through the import mechanism, Thread-1 manufactures JavaPackage objects in the package manager for "thin" and "thin.air". It goes via org.python.core.Py.findClassInternal, where classLoader != null, for its particular sys module, so it loads thin.air.Foo via that. It creates a PyJavaType for it (fully, under the type registry lock) and in org.python.core.PyJavaPackage.addClass(String, Class<?>) it places an entry "Foo" in the __dict__ of the PyJavaPackage for "thin.air". As it does this, Thread-1 also makes entries in sys.modules for thin and thin.air.

So far, so good.

Now, let Thread-2 try the same thing. Because it has a distinct sys, it does not find entries there made by Thread-1 in it's sys.modules (good), but each time it tries to resolve one of the elements of thin.air.Foo at org.python.core.PyJavaPackage.__findattr_ex__(String), it finds entries there, thanks to Thread-1. The outcome of this is that Thread-2 never calls its loader to load thin.air.Foo, instead finding the PyJavaType Thread-1 created.

I'm pretty sure that's not what we want.
msg12910 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-27.23:32:58
Actually, I wrote too soon.

If I let Thread-2 overtake Thread-1, after finding Thread-1's definition, it makes it into org.python.core.imp.ensureFromList(). Here it starts looking for a package called thin.air.Foo.*, and does consult the specific class loader, although with a variety of spurious names (like "Foo.class.__doc__"). (See also #2854.)

The root of the problem I've found is that distinct sys modules continue to share a common package manager, which is a static member of PySystemState. The solution, I think, is to ensure each sys has its own package manager, and that they are synchronised properly (to cover the case where multiple threads access one PySystemState).

Meanwhile, every import works just fine: no missing attributes or reported exceptions. So I'm not at all sure I'm studying the problem Martin Ginkel observed. The original post is a pretty fair description of how 2.7.1 behaved and 2.7.2 doesn't.

I have pushed the change which adds my JUnit test. In the interests of a clean build, the failing assertions are commented out.

https://hg.python.org/jython/rev/e306270a4771

Confirmation would be appreciated that I'm reproducing the circumstances of this issue. Or a script that reproduces the actual issue.
msg12912 (view) Author: Jeff Allen (jeff.allen) Date: 2020-01-10.07:44:29
@FraOlo: do you think this accounts for what you are seeing, or have I simply found a different bug?
msg12914 (view) Author: Jeff Allen (jeff.allen) Date: 2020-01-24.07:42:02
I have stumbled into a test that reasonably-well reproduces what I think is the original problem. I have a test resembling:

from javax.swing.text.Utilities import *
f = getNextWord # May raise NameError
 
If I invoke this simultaneously from 100 threads, it fails just occasionally.

I attempted to disturb the timing, hoping for more frequent failures, by adding a sleep:

from java.lang import Thread
Thread.sleep(instance*10)  # May raise AttributeError
from javax.swing.text.Utilities import *
f = getNextWord            # May raise NameError

This second one fails about one time in five, claiming Thread has no sleep. I suspect it is a cache problem, not a construction one, but I don't understand how it arises.

The problem I found 27/12 (that the cache produces classes that originate in the wrong class loader) is real, and I have a test that fails every time, but I think it is not the one reported.
msg12918 (view) Author: Jeff Allen (jeff.allen) Date: 2020-01-25.10:40:13
[get the name right!]

I managed to trap an example of this going wrong. I hope I've now got the right error. The problem is essentially that a recursive call to PyType.fromClass() is capable of premature publication. Such a call occurs during construction of the MRO. Any thread not already waiting inside the type system will then pick up an incomplete PyType from the ClassValue.

We do need those incomplete values internally, in order to build structures like the MRO that refer to the object that contains them.

We need a way to allow PyType.fromClass to return an incomplete type to the unique thread already working, while not binding it for other threads to find. I notice that it is allowable for computeValue() to throw an exception and no value is then bound, which may provide the back-channel I need.
msg12920 (view) Author: Martin Ginkel (FraOrolo) Date: 2020-01-25.11:03:58
Hi Jeff,

sorry, have been offline for while.
I tried to reproduce and imagine some way that the 'package bug' from 2019-12-27 affects our code. But there is only one package, one class with a lot of static methods. 
I'm pretty sure this is not the cause of my problem.

The other variant, you described and reported today would fit much better. We do have 4-8 threads running the same import at almost the same time. Sometimes one of them traps into some premature PyJavaType that does not know the static methods (yet).
msg12922 (view) Author: Jeff Allen (jeff.allen) Date: 2020-01-25.23:31:25
Hi Martin.

I have fixed the publication bug my test demonstrates, so perhaps your problem. I'm up to 700 repeats of 100 threads without it failing. Previously it would fail after a few tens of repeats.

The exception technique almost does it (there's another twist). I'll tidy up and push it this weekend.
msg12936 (view) Author: Jeff Allen (jeff.allen) Date: 2020-01-26.22:03:03
Fixed, pending confirmation, at https://hg.python.org/jython/rev/44280f7e1854
msg12946 (view) Author: Martin Ginkel (FraOrolo) Date: 2020-01-31.11:11:42
Hi,

I have integrated the change into our production build and the tests did not hit the issue for a couple of days now. 
Therefore I believe you fixed the Issue, GREAT!

Thanks a lot
Martin
msg12958 (view) Author: Jeff Allen (jeff.allen) Date: 2020-02-01.09:00:18
That's excellent news. Thanks for testing and reporting back to us.
History
Date User Action Args
2020-02-01 09:00:18jeff.allensetstatus: pending -> closed
messages: + msg12958
2020-01-31 11:11:42FraOrolosetmessages: + msg12946
2020-01-26 22:03:03jeff.allensetstatus: open -> pending
assignee: jeff.allen
resolution: accepted -> fixed
messages: + msg12936
2020-01-25 23:31:25jeff.allensetmessages: + msg12922
2020-01-25 11:03:59FraOrolosetmessages: + msg12920
2020-01-25 10:40:13jeff.allensetmessages: + msg12918
2020-01-25 10:38:19jeff.allensetmessages: - msg12916
2020-01-25 10:32:30jeff.allensetmessages: + msg12916
2020-01-24 07:42:02jeff.allensetmessages: + msg12914
2020-01-10 07:44:29jeff.allensetresolution: accepted
messages: + msg12912
2019-12-27 23:32:58jeff.allensetmessages: + msg12910
2019-12-27 18:45:45jeff.allensetmessages: + msg12908
title: import of Java classes is not threadsafe -> import of Java classes is not thread safe
2019-12-26 16:54:31jeff.allensetmessages: + msg12904
2019-12-26 12:24:32jeff.allensetmessages: + msg12902
title: import of Java classes is not threadsafe, probably because of PyJavaType -> import of Java classes is not threadsafe
2019-12-25 14:37:04jeff.allensetmessages: + msg12900
2019-12-25 14:32:31jeff.allensetmessages: + msg12898
2019-12-17 11:20:26FraOrolosetmessages: + msg12850
2019-12-14 14:02:42jeff.allensetmessages: + msg12834
2019-12-13 07:54:20jeff.allensetpriority: normal -> high
messages: + msg12828
milestone: Jython 2.7.2
2019-11-27 09:01:12FraOrolosetmessages: + msg12810
2019-11-26 16:55:50zyasoftsetnosy: + zyasoft
messages: + msg12804
2019-11-25 00:22:37jeff.allensetpriority: normal
nosy: + jeff.allen
messages: + msg12792
2019-11-20 12:55:50FraOrolocreate