Message12744

Author jeff.allen
Recipients jeff.allen, pekka.klarck
Date 2019-11-04.07:57:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1572854274.25.0.78767483013.issue2820@roundup.psfhosted.org>
In-reply-to
Content
Thanks for testing. It's the same on Windows. As you can see, the problem is that when we encounter bytes in the context of file paths, we assume they are utf-8 encoded. A simpler test is:

f = open('hyv\xe4', 'w')

This works:

f = open(u'hyv\xe4', 'w')

But it means something different. (I now have a file called "hyvä".) Similarly sys.path.append(u'hyv\xe4') produces the effect you expect.

There is an argument (and it won amongst the developers of CPython) that file names are arbitrary sequences of bytes. Unfortunately (?), Java wants a String, and generally we have lost the encoding of the bytes by the time we need to produce it (since this does not just affect file names).

I found this helpful: http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters . But we do not read locale information to set the file system encoding. A UTF-8 locale is almost universal these days on Linux.

Is the bug that, despite what would happen in an open() statement, the invalid directory should result in an ImportError? I.e. that should be the result of *anything* that goes wrong during an import?
History
Date User Action Args
2019-11-04 07:57:54jeff.allensetmessageid: <1572854274.25.0.78767483013.issue2820@roundup.psfhosted.org>
2019-11-04 07:57:54jeff.allensetrecipients: + jeff.allen, pekka.klarck
2019-11-04 07:57:54jeff.allenlinkissue2820 messages
2019-11-04 07:57:54jeff.allencreate