Message11337

Author jeff.allen
Recipients bstjean, jeff.allen, liuxy_hes86, zyasoft
Date 2017-05-01.14:28:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1493648922.14.0.386784708518.issue2356@psf.upfronthosting.co.za>
In-reply-to
Content
In a significant change of approach (see #1839) I have addressed this by making sys.getfilesystemencoding() == 'utf-8' and it works pretty well. I've tweaked a lot of exsting code. Some is quite old. I have published to here:

https://bitbucket.org/tournesol/jython-utf8

in case anyone sees a massive flaw. If not, I'll push to the main repo.

The current regression test runs for my user name "Épreuve" and passes, but not yet for "用户名". I think we are still assuming bytes are unicode in some places. So I estimate that Benoît is now ok, but there's more to do for 雪彦.

Just to show off a bit what we can do:

> dist\bin\jython
Jython 2.7.1rc1 (default:060e4e4a06d8, Apr 30 2017, 23:08:20)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os, os.path
>>> os.getcwd()
'C:\\Users\\\xe7\x94\xa8\xe6\x88\xb7\xe5\x90\x8d\\Documents\\Jython\\utf-8'
>>> print os.getcwdu()
C:\Users\用户名\Documents\Jython\utf-8
>>> f = open(os.path.join(u'c-\u5496\u5561', u'\u56f0\u96be.txt'), 'wb')
>>> print f.name
c-咖啡\困难.txt
>>> f.close()
>>> f = open(os.path.join(u's-\U0001f40d', u'pythón'), 'wb')
>>> f
<open file u's-\U0001f40d\\pyth\xf3n', mode 'wb' at 0x3>

I observe that it is mostly having a non-ascii installation location, current directory or TMP/TEMP that cause the trouble. I can perhaps simulate those things without actually having changing user name (which tends to break the tools I need). It's also a clue to a work-around.
History
Date User Action Args
2017-05-01 14:28:42jeff.allensetmessageid: <1493648922.14.0.386784708518.issue2356@psf.upfronthosting.co.za>
2017-05-01 14:28:42jeff.allensetrecipients: + jeff.allen, zyasoft, liuxy_hes86, bstjean
2017-05-01 14:28:42jeff.allenlinkissue2356 messages
2017-05-01 14:28:41jeff.allencreate