Issue1745880

classification
Title: PyArray.tostring() charset error, iso-8859-1
Type: Severity: normal
Components: Core Versions:
Milestone:
process
Status: closed Resolution: invalid
Dependencies: Superseder:
Assigned To: Nosy List: cgroves, donghp1979
Priority: normal Keywords:

Created on 2007-07-01.02:47:47 by donghp1979, last changed 2007-07-01.09:03:23 by cgroves.

Messages
msg1694 (view) Author: donghp1979 (donghp1979) Date: 2007-07-01.02:47:47
iso-8859-1 charset is too simple, maybe convert jarray('b') to error message, examples for socket.py

I think it modified to java.nio.charset.Charset.defaultCharset()

use it maybe righit in most situation!

I found this error using charset 'gbk', chinese~
class PyArray

    ...........
    public String tostring() {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        try {
            toStream(bos);
        } catch(IOException e) {
            throw Py.IOError(e);
        }
        try {
            // The returned string is used as a Python str with values
            // from 0-255.  iso-8859-1 maps the byte values into that range.
            return new String(bos.toByteArray(), "iso-8859-1")
        } catch (UnsupportedEncodingException e) {
            throw Py.JavaError(e);
        }
    }
msg1695 (view) Author: Charlie Groves (cgroves) Date: 2007-07-01.09:03:23
As I'm trying to explain in the comment, iso-8859-1 isn't really being used as an encoding here.  It maps java byte values, -128 to 127, into the values of a Python str stored as a string, 0-255.  It's a fixed operation having nothing to do with the local encoding.  The fact that it used the local encoding before was broken.  

You can call encode(<yourcharset>) on the returned str from this method to turn it into a unicode object with the encoding you desire.
History
Date User Action Args
2007-07-01 02:47:47donghp1979create