Title: os.stat fails on Windows if path contains chars with ordinal over 255
Type: Severity: normal
Components: Versions:
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amak, fwierzbicki, pekka.klarck, pjenvey
Priority: Keywords:

Created on 2010-09-26.21:23:44 by pekka.klarck, last changed 2013-03-01.00:22:18 by amak.

msg6097 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-09-26.21:23:42
Here's a simple example demonstrating the problem:

import os
for c in 254, 255, 256:
    f = unichr(c)+'.txt'
    open(f, 'w').close()
    print repr(f), 'exists', os.path.exists(f)

When the above code is run Linux, it reports all created files as existing and exits cleanly. On Windows you get this:

u'\xfe.txt' exists True
u'\xff.txt' exists True
u'\u0100.txt' exists False
Traceback (most recent call last):
  File "", line 6, in <module>
  File "C:\jython2.5.1\Lib\", line 478, in stat
    return stat_result.from_jnastat(_posix.stat(abs_path))
  File "C:\jython2.5.1\Lib\", line 103, in error
    raise OSError(err, strerror(err), asPyString(msg))
OSError: [Errno 2] No such file or directory: 'D:\\\u0100.txt'

As the failing os.path.exists in the above code already illustrated, a bug in os.stat is pretty annoying because so many other methods depend on it. I originally noticed this problem when shutil.rmtree didn't work.

The error occurs when the stat method calls _posix.stat. This _posix is returned by a factory and in my case it was WindowsPOSIX. I was able to make our projects acceptance tests pass with this workarounds:

if sys.platform.startswith('java') and os.sep == '\\':
    os._posix = os.JavaPOSIX(os.PythonPOSIXHandler())
    os._native_posix = False

Could someone who knows Jython internals better comment is this workaround valid? Hopefully this is can be fixed in 2.5.2.
msg6098 (view) Author: Philip Jenvey (pjenvey) Date: 2010-09-26.23:04:38
I rewrote the posix module for 2.5.2 and the stat function was overhauled. Can you try this on there? It may already be fixed
msg6099 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-09-26.23:09:51
Forgot to mention earlier tested that the bug appears also with 2.5.2 beta 2.

Philip, do you think the workaround I presented is valid? Even if this gets fixed in 2.5.2, we need to support people who have 2.5.1 installed.
msg6129 (view) Author: Philip Jenvey (pjenvey) Date: 2010-10-03.22:05:47
This is stat's fault, open creates the file with the correct filename

The underlying stat impl is Windows _stat64 via jnr-posix. The fix for this might be to use _wstat64 instead. Though CPython apparently uses a different API call for win32 stat: GetFileAttributesExW

Another solution might be to support sys.getfilesystemencoding and encode the filename first, but the JVM doesn't even seem to support the 'mbcs' encoding?? (correct me if I'm wrong)

Jython's os.listdir with a str arg also differs from what CPython returns for this file. I guess this should be expected since Jython lacks a sys.getfilesystemencoding() value on Windows

We return:

>>> os.listdir('.')
['\xfe.txt', '\u0100.txt', '\xff.txt']
>>> os.listdir(u'.')
[u'\xfe.txt', u'\u0100.txt', u'\xff.txt']

CPython 2.5:
>>> os.listdir('.')
['\xfe.txt', 'A.txt', '\xff.txt']
>>> os.listdir(u'.')
[u'\xfe.txt', u'\u0100.txt', u'\xff.txt']

You get 'A.txt' from '\u100.txt'.encode('mbcs') ('mbcs' being sys.getfilesystemencoding())
msg6150 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-10-06.22:49:06
I don't have enough knowledge to comment what's the right way to fix this. If there's no better solution, using JavaPOSIX instead of WindowsPOSIX seems to work. Apparently the latter provide more functionality, but I think this bug is too severe to be left unfixed.

os.listdir returning different bytes on Jyhton and than on CPython is also discussed in issue #1593.
msg6156 (view) Author: Philip Jenvey (pjenvey) Date: 2010-10-07.21:20:38
So CPython has the 'mbcs' encoding as a generic name for the current Windows code page (CP_ACP) -- meaning mbcs could be one of many encodings depending on your locale. It also uses Windows system APIs for the encoding/decoding.

I'm not sure why it works this way -- maybe it's so CPython doesn't have to formally map all the various Windows encodings (including some of the odd Windows specific ones) to real encodings. Or maybe some of those encodings aren't supported on all Windows platforms.

The JVM's file.encoding property is derived from the current user's locale. The JVM maps the locale to one of its internal encodings. However it looks like it may fall back to UTF-8 in some cases.

So the JVM's file.encoding property could potentially be our filesystemencoding value on Windows. Would it be 100% reliable though?

And maybe we'd want to emulate the mbcs encoding for compatibility sake?
msg6252 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-11-17.11:11:19
The same problem appears also with Jython 2.5.2rc2.

Unfortunately my earlier workaround doesn't work anymore with this release, because `JavaPOSIX` and friends are not exposed in the `os` module anymore. Is there any change that the underlying bug is fixed before 2.5.2 final or should I try to find another workaround?
msg6780 (view) Author: Pekka Klärck (pekka.klarck) Date: 2012-02-13.07:38:52
Based on my experimentation java.lang.System.getProperty('file.encoding') returns the correct encoding to use. I submitted separate issue #1839 about implementing sys.getfilesystemencoding() using it.
Date User Action Args
2013-03-01 00:22:18amaksetnosy: + amak
2013-02-26 17:36:22fwierzbickisetnosy: + fwierzbicki
2012-02-13 07:38:52pekka.klarcksetmessages: + msg6780
2010-11-17 11:11:19pekka.klarcksetmessages: + msg6252
2010-10-07 21:20:39pjenveysetmessages: + msg6156
2010-10-06 22:49:07pekka.klarcksetmessages: + msg6150
2010-10-03 22:05:47pjenveysetmessages: + msg6129
2010-09-26 23:09:52pekka.klarcksetmessages: + msg6099
2010-09-26 23:04:38pjenveysetnosy: + pjenvey
messages: + msg6098
2010-09-26 21:23:44pekka.klarckcreate