Issue2632
Created on 2017-10-21.07:16:14 by jeff.allen, last changed 2018-11-04.15:01:36 by jeff.allen.
msg11625 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2017-10-21.07:16:13 |
|
As reported in https://github.com/jythontools/jython/issues/90, an attempt to write non-ascii text via csv results in the infamous:
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
Being old-school Python 2, this module thinks in bytes and leaves encoding to the user. By convention (?), the content of a CSV will be interpreted as UTF-8, so clients sensitive to the problem will supply encoded data. We reverse this philosophy in Python 3.
Almost certainly we should use a ByteBuffer where we presently use a StringBuilder since the file is in binary mode and the user should expect to encode the text before calling csv.writer.writerow().
|
msg11631 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2017-10-25.16:03:31 |
|
Actually, non-ascii text is ok unless you supply it as a unicode. In that case, we buffer up the Java chars internally (UTF-16), and then try to treat this String as bytes, hence the error. If the client supplies a unicode object, I believe we should be encoding it with the default encoding. In the same circumstances, CPython says something like:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 12: ordinal not in range(128)
So the StringBuilder can stay, but we ought to encode unicode objects as they arrive, if only so that we can fail the way CPython does.
|
msg11670 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2017-11-21.23:31:50 |
|
Possibly fixed in https://hg.python.org/jython/rev/08978c4d1ab0
We now accept unicode objects and write and them with the default encoding (like CPython).
|
|
Date |
User |
Action |
Args |
2018-11-04 15:01:36 | jeff.allen | set | status: pending -> closed resolution: accepted -> fixed |
2017-11-21 23:31:50 | jeff.allen | set | status: open -> pending resolution: accepted messages:
+ msg11670 milestone: Jython 2.7.2 |
2017-10-25 16:03:31 | jeff.allen | set | messages:
+ msg11631 title: Handle byte data transparently in csv module -> Handle unicode data appropriately in csv module |
2017-10-21 07:16:14 | jeff.allen | create | |
|