Message9275

Author zyasoft
Recipients Arfrever, jeff.allen, zyasoft
Date 2014-12-30.16:34:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1419957272.88.0.39093660111.issue2239@psf.upfronthosting.co.za>
In-reply-to
Content
So this bug has been in Jython since at least 2.5:

$ jython25
Jython 2.5.4 (2.5:5ce837b1a1d8+, Dec 30 2014, 09:01:23)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> glob.glob("unicode/*")
['unicode/\u9996\u9875']
>>> os.stat('unicode/\u9996\u9875')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 2] No such file or directory: '/Users/jbaker/test/unicode/\\u9996\\u9875'

Jeff's fixes to ensure that no 16-bit characters sneak into PyString simply made it fail earlier, which is a very good thing indeed.

Probably the right thing to do is for os.listdir and similar os functions to emit PyUnicode if any characters > 127; we can see the obvious bug here from PosixModule.java:

    public static PyList listdir(PyObject path) {
        String absolutePath = absolutePath(path);
        File file = new File(absolutePath);
        String[] names = file.list();

        if (names == null) {
            // Can't read the path for some reason. stat will throw an error if it can't
            // read it either
            FileStat stat = posix.stat(absolutePath);
            // It exists, maybe not a dir, or we don't have permission?
            if (!stat.isDirectory()) {
                throw Py.OSError(Errno.ENOTDIR, path);
            }
            if (!file.canRead()) {
                throw Py.OSError(Errno.EACCES, path);
            }
            throw Py.OSError("listdir(): an unknown error occurred: " + path);
        }

        PyList list = new PyList();
        PyString string = (PyString) path;
        for (String name : names) {
            list.append(string.createInstance(name));
        }
        return list;

The point of string.createInstance(name) is that it will construct a PyString or PyUnicode based on the starting path; but this is also what caused the quiet bug earlier, and now the immediate failure we are seeing with Jeff's fixes.

The alternative is to try to simulate CPython here and return paths encoded using the underlying filesystem encoding (maybe UTF-8, maybe something else). But this is going too far: Java file paths are inherently already in Unicode.

But the most important reason is that we should be able to take os.listdir output and use with java.io.File, etc, etc. Java interoperability remains the most important reason for using Jython, after all.
History
Date User Action Args
2014-12-30 16:34:32zyasoftsetmessageid: <1419957272.88.0.39093660111.issue2239@psf.upfronthosting.co.za>
2014-12-30 16:34:32zyasoftsetrecipients: + zyasoft, jeff.allen, Arfrever
2014-12-30 16:34:32zyasoftlinkissue2239 messages
2014-12-30 16:34:32zyasoftcreate