Message8640

Author zyasoft
Recipients cgroves, fwierzbicki, jeff.allen, pjenvey, yyamano, zyasoft
Date 2014-06-14.00:46:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1402706789.78.0.66327667621.issue1066@psf.upfronthosting.co.za>
In-reply-to
Content
Fixed in http://hg.python.org/jython/rev/6c718e5e9ae9 to the extent possible by using java.nio.charset.Charset. Here are the codecs not available, more or less what Philip identified in msg3880:

euc_jis_2004
euc_jisx0213  
hz
iso2022_jp_1
iso2022_jp_2004
iso2022_jp_3
iso2022_jp_ext 
shift_jis_2004

hz could potentially be supported by preprocessing - it's a way of encoding GB2312 as 2 7-bit bytes, with escaping provided by ~{...~}. It's possible that ICU4J could potentially help as well.

We also potentially gain other encodings as well, such as cp1047, as needed by http://bugs.jython.org/issue550200, supporting EBCDIC.

The one remaining issue I see here is that there are a couple of minor corner cases around errors for trailing bytes where it is not final. It's not clear to me what can really be done here in this case, since it seems to be a property of the decoder; at the very least it's something that's picked up by our unit tests, so it's visible.
History
Date User Action Args
2014-06-14 00:46:29zyasoftsetmessageid: <1402706789.78.0.66327667621.issue1066@psf.upfronthosting.co.za>
2014-06-14 00:46:29zyasoftsetrecipients: + zyasoft, cgroves, fwierzbicki, pjenvey, yyamano, jeff.allen
2014-06-14 00:46:29zyasoftlinkissue1066 messages
2014-06-14 00:46:28zyasoftcreate