Message9278
@Jeff,
It's pretty straightforward - all paths are Unicode in Java. Apparently so are environment variables and their values. So if we look at PosixModule, it's trying to replicate what's specified (more or less) in https://docs.python.org/2/library/os.html#os.listdir by intercepting the returned String and making a PyString/PyUnicode as appropriate.
I believe the right choice for us is if the path is PyString, to only return a PyString if ascii, otherwise PyUnicode, because we don't actually support encoded strings anyway.
Likewise, we have a similar problem in os.environ, as supported by PosixModule.getEnviron:
private static PyObject getEnviron() {
PyObject environ = new PyDictionary();
Map<String, String> env;
try {
env = System.getenv();
} catch (SecurityException se) {
return environ;
}
for (Map.Entry<String, String> entry : env.entrySet()) {
environ.__setitem__(Py.newString(entry.getKey()), Py.newString(entry.getValue()));
}
return environ;
}
https://github.com/jythontools/jython/blob/master/src/org/python/modules/posix/PosixModule.java#L896
Note that Python 3 separates out os.environ and os.environb:
https://docs.python.org/3/library/os.html#os.environ
So when I run on Python 3, os.environ has this entry:
'PWD': '/Users/jbaker/test/unicode/首页'
whereas on Python 2.7:
'PWD': '/Users/jbaker/test/unicode/\xe9\xa6\x96\xe9\xa1\xb5'
Compare Java:
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#getenv()
And using Jython 2.5 in this same directory:
>>> System.getenv().get("PWD")
u'/Users/jbaker/test/unicode/\u9996\u9875'
And it's all related to that curious entity of surrogateescape |
|
Date |
User |
Action |
Args |
2014-12-30 18:31:04 | zyasoft | set | messageid: <1419964264.61.0.289657775165.issue2239@psf.upfronthosting.co.za> |
2014-12-30 18:31:04 | zyasoft | set | recipients:
+ zyasoft, jeff.allen, Arfrever |
2014-12-30 18:31:04 | zyasoft | link | issue2239 messages |
2014-12-30 18:31:04 | zyasoft | create | |
|