Issue1807
Created on 2011-10-11.06:42:29 by ssoldatenko, last changed 2018-03-17.09:16:51 by jeff.allen.
Messages | |||
---|---|---|---|
msg6667 (view) | Author: Sam (ssoldatenko) | Date: 2011-10-11.06:42:28 | |
Failed to set UTF-8 encoding for output redirected to file. --------x.py-------- #!/usr/bin/python # -*- coding: UTF-8 -*- import sys print sys.stdout.encoding print sys.getdefaultencoding() print(u"Ф") --------x.py-------- === Output to console works fine === $ java -jar jython.jar -Dpython.console.encoding=UTF-8 x.py UTF-8 ascii Ф === Output to file does not work === $ java -jar jython.jar -Dpython.console.encoding=UTF-8 x.py > tmp.txt 2>tmp2.txt ; cat tmp.txt $ cat tmp.txt None ascii $ cat tmp2.txt Traceback (most recent call last): File "x.py", line 9, in <module> print(u"Ф") UnicodeEncodeError: 'ascii' codec can't encode character u'\u0424' in position 0: ordinal not in range(128) |
|||
msg6711 (view) | Author: Irmen de Jong (irmen) | Date: 2011-11-06.22:55:33 | |
in PySystemState.java there is this code fragment in initEncoding: if (stdStream.isatty()) { stdStream.encoding = encoding; } So the encoding is only applied if the stream is a tty. When the check was removed, your code example works fine. I'm not sure why this check is there? |
|||
msg6717 (view) | Author: Sam (ssoldatenko) | Date: 2011-11-07.06:20:11 | |
I think it because of property name '-Dpython.console.encoding=UTF-8'. It is CONSOLE encoding, not output encoding... Can we add properties python.stdout.encoding and python.stderr.encoding? Can we then change the code of initialization of the streams? python.console.encoding - applied when stdout of stderr is a terminal. python.stdout.encoding - applied when stdout is not a terminal, or when it is terminal but python.console.encoding is not set. python.stderr.encoding - same as python.stdout.encoding. |
|||
msg11819 (view) | Author: Jeff Allen (jeff.allen) | Date: 2018-03-17.09:16:50 | |
This is awfully old. I used this modified program to investigate: -------- y.py -------- # Jython issue 1807 # -*- coding: UTF-8 -*- import sys try: import java.lang enc = java.lang.System.getProperty("file.encoding") except: enc = "cp936" print enc print sys.stdout.encoding print sys.getdefaultencoding() print(u"Ф".encode(enc)) print(u"Ф") -------- y.py -------- The behaviour of Jython 2.7.2a1 observed on Windows is as expected at the console: PS iss1807> jython "-Dfile.encoding=cp936" y.py cp936 ms936 ascii Ф Ф PS iss1807> jython "-Dfile.encoding=utf-8" y.py utf-8 ms936 ascii 肖 Ф Any unicode written to a redirected stdout is written to file as UTF-16, little-endian with BOM. (I think we let Java handle it directly.) To my surprise, this encoding is chosen whatever the setting the Java property file.encoding. PS iss1807> jython "-Dfile.encoding=ms936" y.py > y936.txt PS iss1807> filedump -Bx y936.txt ff fe 6d 00 73 00 39 00 ? ? m . s . 9 . 33 00 36 00 0d 00 0a 00 3 . 6 . . . . . 6d 00 73 00 39 00 33 00 m . s . 9 . 3 . 36 00 0d 00 0a 00 61 00 6 . . . . . a . 73 00 63 00 69 00 69 00 s . c . i . i . 0d 00 0a 00 24 04 0d 00 . . . . $ . . . 0a 00 24 04 0d 00 0a 00 . . $ . . . . . EOF This is also how the ascii output from CPython is handled. Maybe it's the shell that is actually doing this? sys.getdefaultencoding() is 'ascii'. It appears that CPython uses this encoding when a unicode object is printed to the redirected stdout (sys.stdout.encoding is None), since the program dies with an encoding error: PS iss1807> python y.py > ycp.txt Traceback (most recent call last): File "y.py", line 16, in <module> print(u"肖") UnicodeEncodeError: 'ascii' codec can't encode character u'\u0424' in position 0: ordinal not in range(128) PS iss1807> filedump -Bx ycp.txt ff fe 63 00 70 00 39 00 ? ? c . p . 9 . 33 00 36 00 0d 00 0a 00 3 . 6 . . . . . 4e 00 6f 00 6e 00 65 00 N . o . n . e . 0d 00 0a 00 61 00 73 00 . . . . a . s . 63 00 69 00 69 00 0d 00 c . i . i . . . 0a 00 24 04 0d 00 0a 00 . . $ . . . . . EOF Overall, it feels to me like what we're doing is not wrong, and there is no reason to expect the contents of y.txt to be UTF-8 encoded as opposed to anything else. It may well be at the discretion of the shell and/or Java runtime, in which case fighting with it is likely to be a dispiriting experience. A comparison on Linux would be interesting. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2018-03-17 09:16:51 | jeff.allen | set | nosy:
+ jeff.allen messages: + msg11819 versions: + Jython 2.7 |
2013-03-05 22:37:02 | amak | set | keywords: - console |
2013-02-25 22:02:28 | amak | set | keywords: + console |
2013-02-25 20:29:15 | fwierzbicki | set | priority: normal nosy: + fwierzbicki versions: + Jython 2.5, - 2.5.2 |
2012-03-19 18:44:32 | amak | set | nosy: + amak |
2011-11-07 06:20:11 | ssoldatenko | set | messages: + msg6717 |
2011-11-06 22:55:33 | irmen | set | nosy:
+ irmen messages: + msg6711 |
2011-10-11 06:42:29 | ssoldatenko | create |
Supported by Python Software Foundation,
Powered by Roundup