Message8818
The last minor change at http://hg.python.org/jython/rev/44191dd20f5a completes this (I think). I've changed the way we record the console encoding so that we preserve original name specified, or deduced, rather than the Java-canonical name. I can now get the intended behaviour:
Active code page: 936
>dist\bin\jython -Dpython.console=org.python.core.PlainConsole
Jython 2.7b3+ (default:44191dd20f5a, Jun 24 2014, 22:42:40)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin.encoding
'ms936'
>>> sys._jy_console.getEncoding()
u'ms936'
>>> sys._jy_console.getEncodingCharset()
x-mswin-936
>>> u = u'\u756b\u86c7\u6dfb\u8db3'
>>> print u
畫蛇添足
>>> s = "使用"
>>> s
'\xca\xb9\xd3\xc3'
>>> print s
使用
>>> raw_input('畫蛇')
畫蛇
''
>>> raw_input('畫蛇: ')
畫蛇: 添足
'\xcc\xed\xd7\xe3'
>>> raw_input('畫蛇: ').decode("gbk")
畫蛇: 添足
u'\u6dfb\u8db3'
It is an odd quirk that there are two Chinese codecs in Java: specifying ms936 will get you the codec with canonical name x-mswin-936, while specifying cp936 will get you one called GBK. ms936 is what we retrieve from java.io.Console, see http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/java/io/Console_md.c#l54. Either of these gets the Python GBK codec, and that's the one necessary for print to work.
In test_ntpath and test_macpath, test_nonascii_abspath() complains of an invalid directory name, which I suspect is due to using the default (therefore multibyte) encoding. This seems so far from the original complaint, and may be a fault in the test anyway, that I feel justified not holding the bug open for that.
It is worth observing that the default JLineConsole does not work with multi-byte encoding. One can fix that on the command line by setting -Dpython.console=org.python.core.PlainConsole, or in the Jython registry file.
Rose:
Were you able to build from source? If so, I think your use of Jython with this code page will be a better test than anything I have done. You may find other faults in our MBCS support, but I'm hopeful that it won't be in the console encoding. |
|
Date |
User |
Action |
Args |
2014-06-24 22:09:46 | jeff.allen | set | messageid: <1403647786.39.0.776538708927.issue2123@psf.upfronthosting.co.za> |
2014-06-24 22:09:46 | jeff.allen | set | recipients:
+ jeff.allen, zyasoft, rpan, kasso |
2014-06-24 22:09:46 | jeff.allen | link | issue2123 messages |
2014-06-24 22:09:45 | jeff.allen | create | |
|