Message8818

Author jeff.allen
Recipients jeff.allen, kasso, rpan, zyasoft
Date 2014-06-24.22:09:45
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1403647786.39.0.776538708927.issue2123@psf.upfronthosting.co.za>
In-reply-to
Content
The last minor change at http://hg.python.org/jython/rev/44191dd20f5a completes this (I think). I've changed the way we record the console encoding so that we preserve original name specified, or deduced, rather than the Java-canonical name. I can now get the intended behaviour:
Active code page: 936

>dist\bin\jython -Dpython.console=org.python.core.PlainConsole
Jython 2.7b3+ (default:44191dd20f5a, Jun 24 2014, 22:42:40)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin.encoding
'ms936'
>>> sys._jy_console.getEncoding()
u'ms936'
>>> sys._jy_console.getEncodingCharset()
x-mswin-936
>>> u = u'\u756b\u86c7\u6dfb\u8db3'
>>> print u
畫蛇添足
>>> s = "使用"
>>> s
'\xca\xb9\xd3\xc3'
>>> print s
使用
>>> raw_input('畫蛇')
畫蛇
''
>>> raw_input('畫蛇: ')
畫蛇: 添足
'\xcc\xed\xd7\xe3'
>>> raw_input('畫蛇: ').decode("gbk")
畫蛇: 添足
u'\u6dfb\u8db3'

It is an odd quirk that there are two Chinese codecs in Java: specifying ms936 will get you the codec with canonical name x-mswin-936, while specifying cp936 will get you one called GBK. ms936 is what we retrieve from java.io.Console, see http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/java/io/Console_md.c#l54. Either of these gets the Python GBK codec, and that's the one necessary for print to work.

In test_ntpath and test_macpath, test_nonascii_abspath() complains of an invalid directory name, which I suspect is due to using the default (therefore multibyte) encoding. This seems so far from the original complaint, and may be a fault in the test anyway, that I feel justified not holding the bug open for that.

It is worth observing that the default JLineConsole does not work with multi-byte encoding. One can fix that on the command line by setting -Dpython.console=org.python.core.PlainConsole, or in the Jython registry file.

Rose:

Were you able to build from source? If so, I think your use of Jython with this code page will be a better test than anything I have done. You may find other faults in our MBCS support, but I'm hopeful that it won't be in the console encoding.
History
Date User Action Args
2014-06-24 22:09:46jeff.allensetmessageid: <1403647786.39.0.776538708927.issue2123@psf.upfronthosting.co.za>
2014-06-24 22:09:46jeff.allensetrecipients: + jeff.allen, zyasoft, rpan, kasso
2014-06-24 22:09:46jeff.allenlinkissue2123 messages
2014-06-24 22:09:45jeff.allencreate