Issue2776

classification
Title: PySystemState is not serializable
Type: Severity: normal
Components: Versions:
Milestone:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: adamburke, filip.drozdowski, stefan.richthofer
Priority: Keywords:

Created on 2019-05-23.22:25:03 by filip.drozdowski, last changed 2019-06-03.22:39:28 by stefan.richthofer.

Messages
msg12532 (view) Author: Filip Drozdowski (filip.drozdowski) Date: 2019-05-23.22:25:03
PySystemState extends PyObject and PyObject implements Serializable, which suggested me that I would be able to serialize PySystemState. Here's a sample code I've used:

    PythonInterpreter interpreter = new PythonInterpreter();
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ObjectOutputStream out = new ObjectOutputStream(bos);
    out.writeObject(interpreter.getSystemState());

The code above triggers the following exception:

java.io.NotSerializableException: java.lang.reflect.Field

	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
	at java.util.concurrent.ConcurrentHashMap.writeObject(ConcurrentHashMap.java:1413)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)


I stopped the program right before it threw NotSerializableException and it looks like the offending object was "public org.python.core.PyObject org.python.core.PySystemState.__name__", which is of type "java.lang.reflect.Field".
msg12544 (view) Author: Adam Burke (adamburke) Date: 2019-05-27.02:01:05
Though I guess my first reaction is "not a bug but a feature", I guess it might be useful to serialize PySystemState. Any chance you can share your use case, though? Are you saving the state of a python interpreter, or trying to move a running interpreter across a network?
msg12546 (view) Author: Filip Drozdowski (filip.drozdowski) Date: 2019-05-28.17:22:31
I've identified that doing module imports the first time in a newly spawned interpreter takes a long time, and hence I'm trying to save the state of a PythonInterpreter in an in-memory cache. I wasn't sure if reusing the same PySystemState would be thread safe, so I wanted to create a deep copy of it. Then, I would instantiate a new PythonInterpreter with a copy of cached local variables in the namespace (retrieved using getLocals()) and copy of cached PySystemState. Since PySystemState has many properties, I thought it would be safer to use serialization to get a deep copy.
msg12548 (view) Author: Adam Burke (adamburke) Date: 2019-05-30.10:50:59
Startup times have been mentioned by a few people, and I imagine they would benefit everyone. Whether by serialization or other means, perhaps it would be worth tinkering with an API within Jython itself for externalizing the state of an interpreter? Since it seems PySystemState require a patch anyway, and juggling PySystemState, the contents of getlocals(), and so on, seems possibly brittle over time.

If you have any detailed timing traces of interpreter startup and module load times, and details of how you generated them, that would also be useful to a number of people.

One last though - PySystemState holds a number of things statically and initializes them once, which makes it a bit painful to unit test, and may also impact on this sort of thing.
msg12550 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2019-05-30.13:44:53
Took a look into the code nd found
public PyObject __name__ = new PyString("sys");
(https://github.com/jythontools/jython/blob/master/src/org/python/core/PySystemState.java#L181)
This should be fine. Any idea when and where this gets modified?
Anyway, how can a java.lang.reflect.Field be assigned to a variable of type PyObject? (I know one can sometimes get the JVM do such shitty stuff by abuse of JNI but that's unlikely to happen here)
msg12552 (view) Author: Filip Drozdowski (filip.drozdowski) Date: 2019-05-30.21:29:00
Ok, I looked at the stack trace I posted earlier more in detail. I stopped the application right before it threw NotSerializableException. When writing to ObjectOutputStream (out.writeObject(interpreter.getSystemState())), Java is trying to serialize the system state's "__dict__" property (sys.__dict__). The first item in the "sys.__dict__" is function "getCurrentWorkingDir" which is of type PyReflectedFunction. Its property called "__module__" is of type PyReflectedField. "__module__" has a property called "field" which is of type "class java.lang.reflect.Field". That's where the serialization fails. 

In short, the problematic field is 'interpreter.getSystemState().__dict__.__getitem__("getCurrentWorkingDir").__module__.field' which is of type java.lang.reflect.Field, and it causes serialization to fail.
msg12554 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2019-06-03.22:23:30
I thought about a solution. We could make field transient and store its class (via getDeclaringClass) and its name (via getName) along with the field. Then initialize the transient field "field" lazily. Class and String are serializable, so this should be fine. If serialization of classes is too complex we could alternatively store the fully qualified name of the class and then lazily init the class via Class.forName.

Do you think that would solve it?

However, I noticed something worse:
In PyReflectedFunction an array of ReflectedArgs is stored. ReflectedArgs contains a plain Object "data". This violates the promize of serializability in PyReflectedFunction. I don't see how to resolve that properly. Only idea I have is to require ReflectedArgs.data to be serializable. Can we change it to "Serializable data"? Thoughts?
msg12556 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2019-06-03.22:39:27
Okay, it looks like ReflectedArgs.data is only accessed in PyReflectedFunction and PyReflectedConstructor. At these places it is cast to a Method or Constructor object respectively. So Object in this case is just a placeholder for "Method or Constructor". So, java.lang.reflect.Executable would be the proper type of ReflectedArgs.data. That wouldn't be serializable.
However, again we can store the name (getName) and the class (getDeclaringClass). In contrast to Field we would also have to store the parameter types (getParameterTypes) to resolve it uniquely in case of overloading. These are again classes and strings, so it should be fine. How sounds?
History
Date User Action Args
2019-06-03 22:39:28stefan.richthofersetmessages: + msg12556
2019-06-03 22:23:31stefan.richthofersetmessages: + msg12554
2019-05-30 21:29:01filip.drozdowskisetmessages: + msg12552
2019-05-30 13:44:53stefan.richthofersetnosy: + stefan.richthofer
messages: + msg12550
2019-05-30 10:50:59adamburkesetmessages: + msg12548
2019-05-28 17:22:31filip.drozdowskisetmessages: + msg12546
2019-05-27 02:01:05adamburkesetnosy: + adamburke
messages: + msg12544
2019-05-23 22:25:03filip.drozdowskicreate