Issue2527

classification
Title: cStringIO throws IllegalArgumentException with non-ASCII values
Type: behaviour Severity: normal
Components: Library Versions: Jython 2.7
Milestone: Jython 2.7.1
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: stefan.richthofer Nosy List: stefan.richthofer, yan12125
Priority: normal Keywords:

Created on 2016-10-20.13:12:52 by yan12125, last changed 2017-02-27.04:50:09 by zyasoft.

Messages
msg10968 (view) Author: Yen Chi Hsuan (yan12125) Date: 2016-10-20.13:12:51
$ ./dist/bin/jython   
Jython 2.7.1b3 (default:a07c595b410f, Oct 20 2016, 21:07:16) 
[OpenJDK 64-Bit Server VM (Oracle Corporation)] on java1.7.0_111
Type "help", "copyright", "credits" or "license" for more information.
>>> import cStringIO
>>> s = cStringIO.StringIO(u'中文')
>>> s.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
        at org.python.core.PyString.<init>(PyString.java:64)
        at org.python.core.PyString.<init>(PyString.java:70)
        at org.python.modules.cStringIO$StringIO.read(cStringIO.java:225)
        at org.python.modules.cStringIO$StringIO.read(cStringIO.java:198)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value

From https://docs.python.org/2/library/stringio.html:
Unlike the StringIO module, this module is not able to accept Unicode strings that cannot be encoded as plain ASCII strings.

Python 2 handles it well:

$ python2
Python 2.7.12 (default, Jun 28 2016, 08:31:05) 
[GCC 6.1.1 20160602] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cStringIO
>>> s = cStringIO.StringIO(u'中文')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Jython should throw a Python exception instead of a Java one in this case.
msg10969 (view) Author: Yen Chi Hsuan (yan12125) Date: 2016-10-20.13:14:39
Downstream bug report: https://github.com/rg3/youtube-dl/issues/10975
msg11039 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2017-01-27.16:57:05
I guess I could easily fix this issue. However one decision puzzles me:
CPython rejects also extended askii cahracters in cStringIO, i.e. ordinal not in range(128). In contrast to that Jython used to throw an exception only if ordinal not in range(256) (see PyString.isBytes).

The example-code however fails in both cases, since it's not even extended ascii. Anyway, which version should trigger the UnicodeEncodeError?
On one hand I'd say stick to CPython-behavior. On the other hand I wouldn't want to break extisting code that might have been tested with the former Jython behavior. Opinions?
msg11051 (view) Author: Yen Chi Hsuan (yan12125) Date: 2017-02-01.10:45:30
+1 for sticking to CPython's behavior. CPython is stricter, so I guess existing codes that are compatible with both Jython and CPython don't use extended ASCII characters.

Even for Jython-only projects, I guess people won't use extended ASCII characters. In CPython, extended ASCII characters are not different than other non-ASCII characters (eg., CJK characters). If a function does not support generic Unicode, most likely it supports pure ASCII only.
msg11052 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2017-02-01.13:35:56
Also, I dug into CPython behavior a bit deeper and it does the check according to its default encoding (which is 'ascii'). In Jython it's also 'ascii' and one can set it via codecs.setDefaultEncoding or PySystemState.setdefaultencoding. By setting it to 'latin-1' the former Jython behavior (excepting ordinals in range(255) instead of range(128)) can be restored.
msg11053 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2017-02-01.18:43:49
Fixed as of https://github.com/jythontools/jython/commit/0b38e392c5adcf9c97a611a2051b02171e3a4764.

I took the opportunity and copied doc-strings from CPython's cStringIO.c to cStringIO.java (https://github.com/jythontools/jython/commit/42552597e00703b2115b5c58ecf7855468f70522)
History
Date User Action Args
2017-02-27 04:50:09zyasoftsetstatus: pending -> closed
2017-02-01 18:43:49stefan.richthofersetstatus: open -> pending
messages: + msg11053
2017-02-01 13:35:56stefan.richthofersetmessages: + msg11052
milestone: Jython 2.7.2 -> Jython 2.7.1
2017-02-01 10:45:31yan12125setmessages: + msg11051
2017-01-31 00:15:43stefan.richthofersetpriority: normal
assignee: stefan.richthofer
type: crash -> behaviour
2017-01-27 16:57:06stefan.richthofersetnosy: + stefan.richthofer
messages: + msg11039
2016-10-27 12:22:35zyasoftsetresolution: accepted
milestone: Jython 2.7.2
2016-10-20 13:14:39yan12125setmessages: + msg10969
2016-10-20 13:12:52yan12125create