Issue1745880

classification

Title:	PyArray.tostring() charset error, iso-8859-1
Type:		Severity:	normal
Components:	Core	Versions:
		Milestone:

process

Status:	closed	Resolution:	invalid
Dependencies:		Superseder:
Assigned To:		Nosy List:	cgroves, donghp1979
Priority:	normal	Keywords:

Created on 2007-07-01.02:47:47 by donghp1979, last changed 2007-07-01.09:03:23 by cgroves.

Messages
msg1694 (view)	Author: donghp1979 (donghp1979)	Date: 2007-07-01.02:47:47
iso-8859-1 charset is too simple, maybe convert jarray('b') to error message, examples for socket.py I think it modified to java.nio.charset.Charset.defaultCharset() use it maybe righit in most situation! I found this error using charset 'gbk', chinese~ class PyArray ........... public String tostring() { ByteArrayOutputStream bos = new ByteArrayOutputStream(); try { toStream(bos); } catch(IOException e) { throw Py.IOError(e); } try { // The returned string is used as a Python str with values // from 0-255. iso-8859-1 maps the byte values into that range. return new String(bos.toByteArray(), "iso-8859-1") } catch (UnsupportedEncodingException e) { throw Py.JavaError(e); } }
msg1695 (view)	Author: Charlie Groves (cgroves)	Date: 2007-07-01.09:03:23
As I'm trying to explain in the comment, iso-8859-1 isn't really being used as an encoding here. It maps java byte values, -128 to 127, into the values of a Python str stored as a string, 0-255. It's a fixed operation having nothing to do with the local encoding. The fact that it used the local encoding before was broken. You can call encode(<yourcharset>) on the returned str from this method to turn it into a unicode object with the encoding you desire.

History
Date	User	Action	Args
2007-07-01 02:47:47	donghp1979	create