Issue2402

classification
Title: Python str should be converted to Java byte[] by default
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: darjus, holmis83, zyasoft
Priority: Keywords:

Created on 2015-09-23.14:58:37 by holmis83, last changed 2015-10-07.05:10:32 by darjus.

Messages
msg10282 (view) Author: Per Holmberg (holmis83) Date: 2015-09-23.14:58:36
When returning a str object from Python script to Java using the JSR223 interface, it will be converted to a Java String. It should be converted to Java byte[], since a Python str is really a plain sequence of bytes. The current behaviour is error-prone, for exampe; printing a Java String that is converted from an UTF8-encoded str gives incorrect result.

Python unicode should still be converted to Java String by default, as it is now.
msg10286 (view) Author: Jim Baker (zyasoft) Date: 2015-09-24.00:13:17
FWIW, Jython does not actually convert the string, instead it returns the unboxed string held by PyString (or PyUnicode). See PyString#getString.

Regardless, the requested change as-is would be a backwards breaking change in user APIs, something we don't want to do in 2.7.x, at least without very good justifications. (Examples of justifications to break APIs in this cycle would be preventing resource leaks, etc. These do not apply here.)

Unfortunately, from what I can tell, there is no way to get the underlying PyObject using the JSR 223 APIs that we support. Instead all outbound objects are converted via __tojava__(Object.class); in the case of PyString, we see the following test immediately at the top of its __tojava__ method:

        if (c.isAssignableFrom(String.class)) {
            return getString();
        }

which for Object is always true.

Otherwise, we could provide you the desired object by simply calling pyobj.__tojava__(byte[].class), which is handled by the PyString#__tojava__ method implementation.

Perhaps we can offer an extended version of the JSR 223 API to do so? It will have to be considered carefully. Presumably we would start with http://docs.oracle.com/javase/7/docs/api/javax/script/Bindings.html and maybe overload get to take the conversion type.
msg10296 (view) Author: Per Holmberg (holmis83) Date: 2015-09-25.14:54:41
Thank you Jim for you detailed answer.

As a work-around, I found the following which seems to do the trick:

    from org.python.core.util import StringUtil
    return StringUtil.toBytes(myStr)

If there is a better way, let me know.

Cheers.
msg10299 (view) Author: Jim Baker (zyasoft) Date: 2015-09-29.03:37:23
Per, that sounds like a good solution for your case. In particular, PyString#__tojava__ uses the same utility method when converting to byte[].class
msg10310 (view) Author: Darjus Loktevic (darjus) Date: 2015-10-07.05:10:32
Closing as workaround provided.
History
Date User Action Args
2015-10-07 05:10:32darjussetstatus: open -> closed
resolution: wont fix
messages: + msg10310
nosy: + darjus
2015-09-29 03:37:23zyasoftsetmessages: + msg10299
2015-09-25 14:54:42holmis83setmessages: + msg10296
2015-09-24 00:13:18zyasoftsetnosy: + zyasoft
messages: + msg10286
2015-09-23 14:58:37holmis83create