Message8618

Author jeff.allen
Recipients jeff.allen, rpan, zyasoft
Date 2014-06-09.23:03:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1402355006.53.0.238796594888.issue2123@psf.upfronthosting.co.za>
In-reply-to
Content
Got it. The parser uses a Java codec, so a literal string has already been decoded from the console by the Java x-mswin-936 codec. But a literal string should contain the bytes equivalent to it in the input encoding. So the parser has to be reverse itself, and is trying to do that with the (non-existent) Python codec. But using the Java codec is more respectable, and it fixes the hang on input.

>dist\bin\jython -Dpython.console=
Jython 2.7b3+ (default:6cee6fef06f0+, Jun 9 2014, 23:22:52)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51
Type "help", "copyright", "credits" or "license" for more information.
>>> "xx"
'xx'
>>> "畫蛇添足"
'\xae\x8b\xc9\xdf\xcc\xed\xd7\xe3'
>>> u"畫蛇添足"
u'\u756b\u86c7\u6dfb\u8db3'
>>> exit()

This doesn't work with the default JLineConsole as that seems to have no idea about multibyte characters.

Output is still failing, as that really does need the codecs from #1066.

I'll push this small change after tests, and then think how to avoid the non-Python name "x-mswin-936".

On the wrapping issue, Jim: if someone defined a codec in Python, then used it a the source encoding, it would be necessary to be able to create a Java codec from it, since the parser has to use it as the decoding in a Reader. In the present design, that is.
History
Date User Action Args
2014-06-09 23:03:26jeff.allensetmessageid: <1402355006.53.0.238796594888.issue2123@psf.upfronthosting.co.za>
2014-06-09 23:03:26jeff.allensetrecipients: + jeff.allen, zyasoft, rpan
2014-06-09 23:03:26jeff.allenlinkissue2123 messages
2014-06-09 23:03:26jeff.allencreate