Created on 2015-05-20.02:21:29 by liuxy_hes86, last changed 2017-03-27.07:45:38 by jeff.allen.
|OpenRefineProblem.txt||bstjean, 2017-03-16.03:37:46||OpenRefine error trace|
|msg10069 (view)||Author: liuxy (liuxy_hes86)||Date: 2015-05-20.02:21:28|
On a windows 8.1 PC, run jython from cmd, then such an error occured: C:\Users\雪彦>jython Exception in thread "main" java.lang.IllegalArgumentException: Cannot create PyS tring with non-byte value at org.python.core.PyString.<init>(PyString.java:64) at org.python.core.PyString.<init>(PyString.java:70) at org.python.core.packagecache.PathPackageManager.addDirectory(PathPack ageManager.java:201) at org.python.core.packagecache.PathPackageManager.addClassPath(PathPack ageManager.java:232) at org.python.core.packagecache.SysPackageManager.findAllPackages(SysPac kageManager.java:96) at org.python.core.packagecache.SysPackageManager.<init>(SysPackageManag er.java:39) at org.python.core.PySystemState.initPackages(PySystemState.java:1127) at org.python.core.PySystemState.doInitialize(PySystemState.java:1057) at org.python.core.PySystemState.initialize(PySystemState.java:974) at org.python.core.PySystemState.initialize(PySystemState.java:930) at org.python.core.PySystemState.initialize(PySystemState.java:925) at org.python.util.jython.run(jython.java:263) at org.python.util.jython.main(jython.java:142)
|msg10070 (view)||Author: Jim Baker (zyasoft)||Date: 2015-05-20.06:38:26|
Likely a duplicate of #2348
|msg10258 (view)||Author: Jeff Allen (jeff.allen)||Date: 2015-09-13.16:42:31|
Probably same as test_os_jy failure in #2397.
|msg10265 (view)||Author: Jeff Allen (jeff.allen)||Date: 2015-09-19.09:51:23|
We're both right. Running Jython 2.7.1b1 founders on #2397, but running a version with that fix, it dies importing site packages. C:\Users\用户名\Documents\Jython> %jt%\dist\bin\jython Exception in thread "main" Traceback (most recent call last): File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\site.py", line 585, in <module> ... UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-11: ordinal not in range(128) Skip the site import and you can get a prompt. C:\Users\用户名\Documents\Jython> %jt%\dist\bin\jython -S Jython 2.7.1 (default:26d248c72b90+, Sep 19 2015, 08:44:17) [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60 >>> I think it would do us all good to work under Chinese user names for a while!
|msg11237 (view)||Author: Benoit St-Jean (bstjean)||Date: 2017-03-16.03:37:45|
In the same vein, I have a similar exception (originates from OpenRefine at startup). Looks like jython and/or java doesn't like my username in Windows 10 and bombs. My WIndows 10 user name is "Benoît St-Jean" (notice the accentuated î).
|msg11261 (view)||Author: Jeff Allen (jeff.allen)||Date: 2017-03-22.07:14:03|
We're not very good with non-ascii paths and program text, certainly on Windows, and in more than one part of the code I suspect. E.g. I have to tweak even build.xml, when I'm logged in as "Épreuve". :( Minable. I'll give this some more time, as I've meant to for a while.
|msg11272 (view)||Author: Jeff Allen (jeff.allen)||Date: 2017-03-25.10:57:43|
I've fixed the build, the problem being that ANTLR would generate files in file.encoding and then we would compile them as UTF-8. It makes no difference to the *text*, but the *comments* contain the full source path. C:\Users\Épreuve\atelier\ ... blahblah ... . Now file.encoding=UTF-8. I'm fighting the launcher now, in the shape of jython.py. One can easily create a complicated situation in which all sorts of encodings are in play. Just at the DOS and Python prompts: > type argtest.py # What do arguments appear as, when codepages intervene? import sys, os, locale, subprocess print sys.argv for arg in sys.argv: print "%s ( %r )" % (arg, arg) > chcp Active code page: 850 > set TEST=Épreuve > python -i argtest.py café crème %TEST% ['argtest.py', 'caf\xe9', 'cr\xe8me', '\xc9preuve'] argtest.py ( 'argtest.py' ) cafÚ ( 'caf\xe9' ) crÞme ( 'cr\xe8me' ) ╔preuve ( '\xc9preuve' ) ### Notice that sys.argv contains byte strings but they are ### not encoded with the console encoding cp850. ### The os module is using the same encoding. >>> os.getcwd() 'C:\\Users\\\xc9preuve\\Documents\\Python2' >>> print os.getcwd() C:\Users\╔preuve\Documents\Python2 >>> print os.getcwdu() C:\Users\Épreuve\Documents\Python2 >>> os.getenv('TEST') '\xc9preuve' ### There are plenty of encodings to choose from. >>> sys.stdout.encoding 'cp850' >>> sys.getdefaultencoding() 'ascii' >>> sys.getfilesystemencoding() 'mbcs' >>> locale.getpreferredencoding() 'cp1252' ### But this one is consistent with what I'm seeing: >>> for a in sys.argv: print a.decode(locale.getpreferredencoding()) ... argtest.py café crème Épreuve What fun! I *tentatively* conclude we must treat arguments and environment variables as encoded with locale.getpreferredencoding(). This also seems to be the acceptable encoding when we come to launch a subprocess: >>> subprocess.call(["python", "argtest.py"] + sys.argv[1:]) ['argtest.py', 'caf\xe9', 'cr\xe8me', '\xc9preuve'] argtest.py ( 'argtest.py' ) cafÚ ( 'caf\xe9' ) crÞme ( 'cr\xe8me' ) ╔preuve ( '\xc9preuve' ) The point here is not that these print correctly, but they print the same as they did when I ran this from the DOS prompt. Now, in jython.py, it's all driven from sys.stdout.encoding, which is different. We may even be calling encode() where we should be decoding. Or possibly we could just leave everything as bytes in the seemingly-consistent encoding of CPython and Windows. I'll see what I can do. (I'll try not to break jython.py for Linux, though it seems the minority case here.) Eventually, when Jython lunches again, I'll get to the bug(s) our users French and Chinese are experiencing, that pops up first in site.py. But fighting jython.py has been instructive. There may be lessons from CPython here about what we should be doing internally to Jython when handling byte strings from the system via file system, environment and arguments.
|msg11276 (view)||Author: Jeff Allen (jeff.allen)||Date: 2017-03-27.07:45:38|
I've re-written jython.py to use Unicode internally, decoding args and environment variables in-bound, and encoding for subprocess.call() out-bound. Both times we use locale.getpreferredencoding(), which is cp1252 on my system while the console encoding is cp850. It passes test_jython_launcher for a user named "Épreuve" as long as I suppress the site module with -S. Interestingly, both virtualenv and PyInstaller (on Python 2.7.13) fail for this user with: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc9 ... .
+ test failure causes|
messages: + msg11276
|2017-03-25 10:57:44||jeff.allen||set||messages: + msg11272|
|2017-03-22 07:14:04||jeff.allen||set||messages: + msg11261|
nosy: + bstjean
messages: + msg11237
|2015-09-19 09:51:24||jeff.allen||set||messages: + msg10265|
|2015-09-13 16:42:31||jeff.allen||set||assignee: jeff.allen|
messages: + msg10258
nosy: + jeff.allen
messages: + msg10070