Issue2513

classification
Title: Standard output is mixed up if Python scripts are evaluated in parallel within one single JVM
Type: behaviour Severity: urgent
Components: Versions: Jython 2.7, Jython 2.5
Milestone: Jython 2.7.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amak, tuska, yocaba
Priority: Keywords:

Created on 2016-08-11.17:09:55 by yocaba, last changed 2017-09-13.21:12:40 by zyasoft.

Files
File name Uploaded Description Edit Remove
PythonEngineTest.java yocaba, 2016-08-11.17:09:54
Messages
msg10891 (view) Author: Doreen Seider (yocaba) Date: 2016-08-11.17:09:54
We are using Jython embedded in a Java application to evaluate Python scripts by using the PythonScriptEngine. For us, it is essential that the Python scripts can be evaluated in parallel within one Java Virtual Machine. We encountered an issue that doesn't allow this scenario: When we evaluate multiple Python scripts in parallel by multiple threads (with different instances of PythonScriptEngine), the standard output of the scripts get mixed up.

Attached a Java file demonstrating the issue:
Different scripts are evaluated by multiple threads in parallel with a new instance of PythonScriptEngine for each script evaluation. Each script prints a certain unique index multiple times. A Writer instance is injected in each PythonScriptEngine instance which checks the output of the script evaluated. Each Writer instance is instantiated with the index to expect.

Expected behavior: Each Writer instance receives only the expected index, means only the output of the Python script it is 'linked' with.

Observed behavior: Writer instances also receive indexes of "forgein" scripts.

(The test run and failed with Jython 2.5.2 and 2.7.0.)

Is there currently a way to make the script evaluation thread-safe?
msg10892 (view) Author: Alan Kennedy (amak) Date: 2016-08-14.17:09:08
> Is there currently a way to make the script evaluation thread-safe?

Script evaluation is already thread safe.

Furthermore, you have ensured thread safety of your code by using AtomicInteger and synchronization, etc, so that is not the problem.

The problem is that all of your threads are sharing the same output channel, and not synchronizing output to it. You would find exactly the same problem if jython was not involved at all. For example, if you run java code that uses System.out.print from multiple threads, then you will find the output of all of those threads to be interleaved, i.e. randomly mixed up.

There are two solutions to the problem.

1. Synchronize access to stdout, through some form of lock. You could, for example, provide a synchronized method that all scripts must use to generate output. That way, the output from multiple threads will be serialized, since only one thread at a time can have the synchronization lock.

2. Use a separate output channel for every thread. This is the preferred mechanism, because the output for every thread will appear on different channels, and there will be no need to figure out where the output comes from.

However, to do this, you must use the native PythonInterpreter constructor, rather than using PyScriptEngine.

All output channels are associated with a PySystemState object, which is essentially the equivalent of the python "sys" module, including sys.stdout and sys.stderr.

When a PythonInterpreter object is created, you can pass a PySystemState object to it, which means that it will have its own unique copy of sys.stdout and sys.stderr.

http://www.jython.org/javadoc/org/python/util/PythonInterpreter.html#PythonInterpreter(org.python.core.PyObject,%20org.python.core.PySystemState)

This is simple to do: see this code for example of how to do it.

https://hg.python.org/jython/file/tip/src/com/xhaus/modjy/ModjyJServlet.java#l109

Some notes.

1. This cannot be done through PyScriptEngine, which shares the same PySystemState object across all PythonInterpreters.

2. Using the builtin jython embedding machinery gives you far more control over your jython interpreters than PyScriptEngine, whose JSR 223 interface is generic to every embeddable language, and does not provide this level of fine control.

3. PythonInterpreter objects are fairly lightweight, so you don't need to be too concerned about using a lot of them, e.g. creating one for every user script that you evaluate and discarding it when finished.

Hope this helps,

Alan.
msg10899 (view) Author: Alan Kennedy (amak) Date: 2016-08-19.17:06:26
Did you resolve your issue?

Is there any reason to leave this open?

There is no actual bug in jython.
msg10901 (view) Author: Doreen Seider (yocaba) Date: 2016-08-22.11:25:59
Thanks for your reply. We cross-tested solution 2) and it worked as expected and in the way we need.

The approach we applied using the PyScriptEngine is based on the Java Scripting API defined by JSR 223. With regards to your note 1): That means the Java Scripting API in general doesn't allow implementations to separate the standard output of two script evaluations performed by different threads?

I'm asking because we'd prefer to implement against the Java Scripting API to easily switch between script languages later on without modifying any code on our side.
msg10902 (view) Author: Alan Kennedy (amak) Date: 2016-08-22.12:54:28
JSR 223 does have facilities for dealing with these situations.

All ScriptEngine "eval" calls can take an optional ScriptContext which defines, among other things, the input and output channels. PyScriptEngine also supports this method.

https://hg.python.org/jython/file/tip/src/org/python/jsr223/PyScriptEngine.java#l30

However, ScriptContext is just an interface, and cannot be instantiated.

So you could write your own implementation of ScriptContext, which can provide separate input and output channels for every call to the eval method.

The good thing about this is that your implementation of ScriptContext would be completely reusable: you could use the same class for every JSR 223 language you might want to support.

http://docs.oracle.com/javase/8/docs/api/javax/script/ScriptContext.html

You could simplify things by using SimpleScriptContext.

http://docs.oracle.com/javase/8/docs/api/javax/script/SimpleScriptContext.html

Hope this helps.

Alan.
msg10903 (view) Author: Doreen Seider (yocaba) Date: 2016-08-23.06:51:32
Thanks again for your help. We tested to pass a new ScriptContext instance to each call of the eval method. We use the SimpleScriptContext implementation as suggested. Thereby, we inject a new Writer instance for standard output to each ScriptContext instance. But the result stays the same: the outputs of the script evaluated get interleaved if the eval methods (incl. the instantiation of PyScriptEngine, ScriptContext, and Writer) are called in parallel in different threads.

From your note 1) ("This cannot be done through PyScriptEngine, which shares the same PySystemState object across all PythonInterpreters.") and also from the code of PyScriptEngine and PythonInterpreter I wouldn't expect that to work either.
Refered to version 2.7.0: In the contructor of the PyScriptEngine a new instance of PythonInterpreter is created with the help of its static factory method threadLocalStateInterpreter(). The method passes null for the PySystemState to the constructor of PythonInterpreter. In the case of a null PySystemState the PythonInterpreter constructor instantiates the PySystemState with Py.getSystemState(). As I don't see any changes to the PySystemState field later on in the class, I'd assume that all PyScriptEngine instances share the same PySystemState no matter if a new ScriptContext instance is set to the engine or passed to the eval method or not. But I simply might miss something here.
msg10916 (view) Author: Alan Kennedy (amak) Date: 2016-08-27.13:15:27
> In the contructor of the PyScriptEngine a new instance of PythonInterpreter
> is created with the help of its static factory method
> threadLocalStateInterpreter(). The method passes null for the PySystemState
> to the constructor of PythonInterpreter. In the case of a null PySystemState
> the PythonInterpreter constructor instantiates the PySystemState with
> Py.getSystemState(). As I don't see any changes to the PySystemState field
> later on in the class, I'd assume that all PyScriptEngine instances share
> the same PySystemState no matter if a new ScriptContext instance is set to
> the engine or passed to the eval method or not. But I simply might miss
> something here.

I don't think you're missing anything, I think you're exactly right.

The PyScriptEngine does indeed create every PythonInterpreter with a shared PySystemState.

ScriptContext is supposed to allow you to separate the input and output of every PyScriptEngine.eval call. But the current PyScriptEngine does not implement this properly.

Instead, when you give it a ScriptContext, it takes the output handler of the ScriptContext and sets the output handler of the embedded PythonInterpreter to that handler.

The implementation of PythonInterpreter.setOut then sets the output handler of its PySystemState to that output handler.

https://hg.python.org/jython/file/tip/src/org/python/util/PythonInterpreter.java#l196

So the fact that all PythonInterpreter objects created by PyScriptEngine share the same PySystemState means that you cannot, using PyScriptEngine, separate the output in the way that you require.

I'd call that a bug in jython's JSR 223 implementation.
History
Date User Action Args
2017-09-13 21:12:40zyasoftsetmilestone: Jython 2.7.2
2017-06-13 12:05:10tuskasetnosy: + tuska
2016-08-27 13:15:28amaksetmessages: + msg10916
2016-08-23 06:51:33yocabasetmessages: + msg10903
2016-08-22 12:54:29amaksetmessages: + msg10902
2016-08-22 11:26:00yocabasetmessages: + msg10901
2016-08-19 17:06:26amaksetmessages: + msg10899
2016-08-14 17:09:11amaksetmessages: + msg10892
2016-08-14 16:43:09amaksetnosy: + amak
2016-08-11 17:09:55yocabacreate