Issue1658

classification
Title: os.stat fails on Windows if path contains chars with ordinal over 255
Type: Severity: normal
Components: Versions:
process
Status: open Resolution: accepted
Dependencies: Superseder:
Assigned To: zyasoft Nosy List: amak, fwierzbicki, pekka.klarck, pjenvey, zyasoft
Priority: high Keywords:

Created on 2010-09-26.21:23:44 by pekka.klarck, last changed 2015-01-16.13:35:32 by pekka.klarck.

Messages
msg6097 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-09-26.21:23:42
Here's a simple example demonstrating the problem:

import os
for c in 254, 255, 256:
    f = unichr(c)+'.txt'
    open(f, 'w').close()
    print repr(f), 'exists', os.path.exists(f)
    os.stat(f)

When the above code is run Linux, it reports all created files as existing and exits cleanly. On Windows you get this:

D:\>jython os_stat_bug.py
u'\xfe.txt' exists True
u'\xff.txt' exists True
u'\u0100.txt' exists False
Traceback (most recent call last):
  File "os_stat_bug.py", line 6, in <module>
    os.stat(f)
  File "C:\jython2.5.1\Lib\os.py", line 478, in stat
    return stat_result.from_jnastat(_posix.stat(abs_path))
  File "C:\jython2.5.1\Lib\os.py", line 103, in error
    raise OSError(err, strerror(err), asPyString(msg))
OSError: [Errno 2] No such file or directory: 'D:\\\u0100.txt'

As the failing os.path.exists in the above code already illustrated, a bug in os.stat is pretty annoying because so many other methods depend on it. I originally noticed this problem when shutil.rmtree didn't work.

The error occurs when the stat method calls _posix.stat. This _posix is returned by a factory and in my case it was WindowsPOSIX. I was able to make our projects acceptance tests pass with this workarounds:

if sys.platform.startswith('java') and os.sep == '\\':
    os._posix = os.JavaPOSIX(os.PythonPOSIXHandler())
    os._native_posix = False

Could someone who knows Jython internals better comment is this workaround valid? Hopefully this is can be fixed in 2.5.2.
msg6098 (view) Author: Philip Jenvey (pjenvey) Date: 2010-09-26.23:04:38
I rewrote the posix module for 2.5.2 and the stat function was overhauled. Can you try this on there? It may already be fixed
msg6099 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-09-26.23:09:51
Forgot to mention earlier tested that the bug appears also with 2.5.2 beta 2.

Philip, do you think the workaround I presented is valid? Even if this gets fixed in 2.5.2, we need to support people who have 2.5.1 installed.
msg6129 (view) Author: Philip Jenvey (pjenvey) Date: 2010-10-03.22:05:47
This is stat's fault, open creates the file with the correct filename

The underlying stat impl is Windows _stat64 via jnr-posix. The fix for this might be to use _wstat64 instead. Though CPython apparently uses a different API call for win32 stat: GetFileAttributesExW

Another solution might be to support sys.getfilesystemencoding and encode the filename first, but the JVM doesn't even seem to support the 'mbcs' encoding?? (correct me if I'm wrong)

Jython's os.listdir with a str arg also differs from what CPython returns for this file. I guess this should be expected since Jython lacks a sys.getfilesystemencoding() value on Windows

We return:

>>> os.listdir('.')
['\xfe.txt', '\u0100.txt', '\xff.txt']
>>> os.listdir(u'.')
[u'\xfe.txt', u'\u0100.txt', u'\xff.txt']

CPython 2.5:
>>> os.listdir('.')
['\xfe.txt', 'A.txt', '\xff.txt']
>>> os.listdir(u'.')
[u'\xfe.txt', u'\u0100.txt', u'\xff.txt']

You get 'A.txt' from '\u100.txt'.encode('mbcs') ('mbcs' being sys.getfilesystemencoding())
msg6150 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-10-06.22:49:06
I don't have enough knowledge to comment what's the right way to fix this. If there's no better solution, using JavaPOSIX instead of WindowsPOSIX seems to work. Apparently the latter provide more functionality, but I think this bug is too severe to be left unfixed.

os.listdir returning different bytes on Jyhton and than on CPython is also discussed in issue #1593.
msg6156 (view) Author: Philip Jenvey (pjenvey) Date: 2010-10-07.21:20:38
So CPython has the 'mbcs' encoding as a generic name for the current Windows code page (CP_ACP) -- meaning mbcs could be one of many encodings depending on your locale. It also uses Windows system APIs for the encoding/decoding.

I'm not sure why it works this way -- maybe it's so CPython doesn't have to formally map all the various Windows encodings (including some of the odd Windows specific ones) to real encodings. Or maybe some of those encodings aren't supported on all Windows platforms.

The JVM's file.encoding property is derived from the current user's locale. The JVM maps the locale to one of its internal encodings. However it looks like it may fall back to UTF-8 in some cases.

So the JVM's file.encoding property could potentially be our filesystemencoding value on Windows. Would it be 100% reliable though?

And maybe we'd want to emulate the mbcs encoding for compatibility sake?
msg6252 (view) Author: Pekka Klärck (pekka.klarck) Date: 2010-11-17.11:11:19
The same problem appears also with Jython 2.5.2rc2.

Unfortunately my earlier workaround doesn't work anymore with this release, because `JavaPOSIX` and friends are not exposed in the `os` module anymore. Is there any change that the underlying bug is fixed before 2.5.2 final or should I try to find another workaround?
msg6780 (view) Author: Pekka Klärck (pekka.klarck) Date: 2012-02-13.07:38:52
Based on my experimentation java.lang.System.getProperty('file.encoding') returns the correct encoding to use. I submitted separate issue #1839 about implementing sys.getfilesystemencoding() using it.
msg9358 (view) Author: Jim Baker (zyasoft) Date: 2015-01-08.04:50:42
Still a problem on Windows, but not Linux, despite the fixes we have mode re Unicode paths in #2239

Likely the problem is due to the underlying C stat function being called in JNR Posix is mixing up the difference between Unicode and bytes.

But we are now on Java 7. Although BasicFileAttributes doesn't give us stuff like inode and device on Unix-like systems (or even the more extended Posix attributes), we don't have them anyway with JNR. Might as well use BasicFileAttributes then when running on Windows.
msg9408 (view) Author: Pekka Klärck (pekka.klarck) Date: 2015-01-16.13:35:32
This seems to be worse in 2.7 than in 2.5. I just reproduced this scenario with 2.7b4 preview on Windows 7:

1) Have a directory `xxx` with a subdirectory `日本語`.

2) Run `jython -c "import shutil; shutil.rmtree(u'xxx')"`

3) End result:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\jython2.7b4-soft\Lib\shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "C:\jython2.7b4-soft\Lib\shutil.py", line 250, in rmtree
    os.remove(fullname)
OSError: [Errno 21] Is a directory: u'xxx\\\u65e5\u672c\u8a9e'


With Jython 2.5.3 the above works just fine. It works also with Python 2.7, but interesting it fails with WindowsError if I give the path as str and not unicode. With Jython versions str vs. unicode doesn't seem to have any difference.
History
Date User Action Args
2015-01-16 13:35:32pekka.klarcksetmessages: + msg9408
2015-01-08 04:50:50zyasoftsetresolution: accepted
2015-01-08 04:50:42zyasoftsetpriority: high
assignee: zyasoft
messages: + msg9358
nosy: + zyasoft
2013-03-01 00:22:18amaksetnosy: + amak
2013-02-26 17:36:22fwierzbickisetnosy: + fwierzbicki
2012-02-13 07:38:52pekka.klarcksetmessages: + msg6780
2010-11-17 11:11:19pekka.klarcksetmessages: + msg6252
2010-10-07 21:20:39pjenveysetmessages: + msg6156
2010-10-06 22:49:07pekka.klarcksetmessages: + msg6150
2010-10-03 22:05:47pjenveysetmessages: + msg6129
2010-09-26 23:09:52pekka.klarcksetmessages: + msg6099
2010-09-26 23:04:38pjenveysetnosy: + pjenvey
messages: + msg6098
2010-09-26 21:23:44pekka.klarckcreate