Created on 2012-02-13.07:37:37 by pekka.klarck, last changed 2015-01-23.20:26:01 by akira.
|msg6779 (view)||Author: Pekka Klärck (pekka.klarck)||Date: 2012-02-13.07:37:36|
With Jython 2.5.2 and earlier sys.getfilesystemencoding() always returns None. This breaks code that tries to encode or decode strings on system boundary. Example uses include decoding received command line arguments or encoding/decoding set/get environment variables. Working sys.getfilesystemencoding() could apparently also fix os.stat on Windows (issue #1658). I have tested that at least on Ubuntu Linux and WinXP with Western locale the value returned by java.lang.System.getProperty('file.encoding') seems to be correct encoding to use. On Ubuntu I get UTF-8 both with that approach and with Python using sys.getfilesystemencoding(). On Windows file.encoding is Cp1252 and sys.getfilesystemencoding() on Python returns mbcs. Both of these are fine as the former is the actual encoding and the latter a special encoding that the operating system later translates to the correct encoding. Notice also that Jython doesn't support mbcs. Based on my experimentation I propose sys.getfilesystemencoding() is implemented using java.lang.System.getProperty('file.encoding').
|msg6948 (view)||Author: Pekka Klärck (pekka.klarck)||Date: 2012-03-21.05:37:07|
It turned out that using 'file.encoding' property doesn't always work because Jython doesn't support all the encodings supported by JVM. That ought to be pretty easy to fix, though, and I submitted a separate issue #1865 about it.
|msg8309 (view)||Author: Jim Baker (zyasoft)||Date: 2014-04-25.16:44:58|
Need to fix for 2.7, a number of libraries we use depend on sys.getfilesystemencoding() Can also remove Jython-specific version of SimpleHTTPServer once this is resolved.
|msg9292 (view)||Author: Jim Baker (zyasoft)||Date: 2015-01-04.17:27:57|
The title of the issue is currently misleading, given that per Python 2.7 docs (https://docs.python.org/2/library/sys.html#sys.getfilesystemencoding): > getfilesystemencoding() > Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used. This is different than the file.encoding system property. Although I was unable to find an authoritative source on this as a standard property, conventionally this sets http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html#defaultCharset(); see http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding
|msg9293 (view)||Author: Jim Baker (zyasoft)||Date: 2015-01-04.17:28:24|
Changing this current behavior is much, much harder than it first appears. I have partially addressed it with the fix for #2239, but the problem is that the file system encoding for Jython is in some sense None - Jython simply uses Unicode paths, much like Java. Also returning None is considered correct behavior: "returns None if the system default encoding is used" (https://docs.python.org/2/library/sys.html#sys.getfilesystemencoding) Supporting anything else is a very big issue once we consider Java integration. In general, any solution should ensure that the following code snippet would always work: [java.io.File(p).exists() for p in os.listdir()] regardless of how wrapped these calls (java.io.File or os.listdir) might actually be. Note that this is somewhat similar to Windows which uses "mbcs" for its file system encoding. Also this problem goes away more or less with Jython 3. Set priority accordingly to low: there is no straightforward perfect fix, which makes sense because it's an integration issue.
|msg9317 (view)||Author: Arfrever Frehtes Taifersar Arahesis (Arfrever)||Date: 2015-01-06.19:09:02|
> Also this problem goes away more or less with Jython 3. CPython 3 still has sys.getfilesystemencoding() and losslessly (i.e. without REPLACEMENT CHARACTERs) supports bytes paths... $ rm -fr /tmp/some_dir $ mkdir /tmp/some_dir $ touch /tmp/some_dir/ś $ touch /tmp/some_dir/$'\x80' $ touch /tmp/some_dir/aaa$'\x80\x81\x82\x83'aaa $ python3.5 -c 'import os; print(os.listdir(b"/tmp/some_dir"))' [b'aaa\x80\x81\x82\x83aaa', b'\xc5\x9b', b'\x80'] $ python3.5 -c 'import os; print(os.listdir("/tmp/some_dir"))' ['aaa\udc80\udc81\udc82\udc83aaa', 'ś', '\udc80'] $ LC_ALL="C" python3.5 -c 'import os; print(os.listdir("/tmp/some_dir"))' ['aaa\udc80\udc81\udc82\udc83aaa', '\udcc5\udc9b', '\udc80'] $ python3.5 -c 'import sys; print(sys.getfilesystemencoding())' utf-8 $ LC_ALL="C" python3.5 -c 'import sys; print(sys.getfilesystemencoding())' ascii
|msg9318 (view)||Author: Arfrever Frehtes Taifersar Arahesis (Arfrever)||Date: 2015-01-06.19:23:43|
Although: >>> "\udcc5\udc9b" == "ś" False But both paths refer to the same file: >>> os.path.exists("/tmp/some_dir/\udcc5\udc9b") True >>> os.path.exists("/tmp/some_dir/ś") True >>> os.stat("/tmp/some_dir/\udcc5\udc9b") == os.stat("/tmp/some_dir/ś") True
|msg9448 (view)||Author: (akira)||Date: 2015-01-23.20:26:01|
sys.getfilesystemencoding() can't be None since Python 3.2   https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding
messages: + msg9448
|2015-01-06 19:23:43||Arfrever||set||messages: + msg9318|
|2015-01-06 19:09:02||Arfrever||set||messages: + msg9317|
|2015-01-06 18:47:28||Arfrever||set||nosy: + Arfrever|
|2015-01-04 17:28:25||zyasoft||set||messages: + msg9293|
|2015-01-04 17:27:58||zyasoft||set||priority: low|
messages: + msg9292
messages: + msg8309
|2013-02-26 23:45:48||amak||set||nosy: + amak|
|2013-02-26 18:12:28||fwierzbicki||set||nosy: + fwierzbicki|
|2012-03-21 05:37:07||pekka.klarck||set||messages: + msg6948|