Issue2356

classification
Title: java.lang.IllegalArgumentException while startup jython on Windows 8.1 with current username is not ASCII characters
Type: crash Severity: major
Components: Core Versions: Jython 2.7
Milestone: Jython 2.7.0
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: bstjean, jeff.allen, liuxy_hes86, zyasoft
Priority: Keywords: test failure causes

Created on 2015-05-20.02:21:29 by liuxy_hes86, last changed 2017-03-27.07:45:38 by jeff.allen.

Files
File name Uploaded Description Edit Remove
OpenRefineProblem.txt bstjean, 2017-03-16.03:37:46 OpenRefine error trace
Messages
msg10069 (view) Author: liuxy (liuxy_hes86) Date: 2015-05-20.02:21:28
On a windows 8.1 PC, run jython from cmd, then such an error occured:

C:\Users\雪彦>jython
Exception in thread "main" java.lang.IllegalArgumentException: Cannot create PyS
tring with non-byte value
        at org.python.core.PyString.<init>(PyString.java:64)
        at org.python.core.PyString.<init>(PyString.java:70)
        at org.python.core.packagecache.PathPackageManager.addDirectory(PathPack
ageManager.java:201)
        at org.python.core.packagecache.PathPackageManager.addClassPath(PathPack
ageManager.java:232)
        at org.python.core.packagecache.SysPackageManager.findAllPackages(SysPac
kageManager.java:96)
        at org.python.core.packagecache.SysPackageManager.<init>(SysPackageManag
er.java:39)
        at org.python.core.PySystemState.initPackages(PySystemState.java:1127)
        at org.python.core.PySystemState.doInitialize(PySystemState.java:1057)
        at org.python.core.PySystemState.initialize(PySystemState.java:974)
        at org.python.core.PySystemState.initialize(PySystemState.java:930)
        at org.python.core.PySystemState.initialize(PySystemState.java:925)
        at org.python.util.jython.run(jython.java:263)
        at org.python.util.jython.main(jython.java:142)
msg10070 (view) Author: Jim Baker (zyasoft) Date: 2015-05-20.06:38:26
Likely a duplicate of #2348
msg10258 (view) Author: Jeff Allen (jeff.allen) Date: 2015-09-13.16:42:31
Probably same as test_os_jy failure in #2397.
msg10265 (view) Author: Jeff Allen (jeff.allen) Date: 2015-09-19.09:51:23
We're both right. Running Jython 2.7.1b1 founders on #2397, but running a version with that fix, it dies importing site packages.

C:\Users\用户名\Documents\Jython> %jt%\dist\bin\jython
Exception in thread "main" Traceback (most recent call last):
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\site.py", line 585, in <module>
...
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-11: ordinal not in range(128)

Skip the site import and you can get a prompt.

C:\Users\用户名\Documents\Jython> %jt%\dist\bin\jython -S
Jython 2.7.1 (default:26d248c72b90+, Sep 19 2015, 08:44:17)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60
>>>

I think it would do us all good to work under Chinese user names for a while!
msg11237 (view) Author: Benoit St-Jean (bstjean) Date: 2017-03-16.03:37:45
In the same vein, I have a similar exception (originates from OpenRefine at startup).  Looks like jython and/or java doesn't like my username in Windows 10 and bombs.  My WIndows 10 user name is "Benoît St-Jean" (notice the accentuated î).
msg11261 (view) Author: Jeff Allen (jeff.allen) Date: 2017-03-22.07:14:03
We're not very good with non-ascii paths and program text, certainly on Windows, and in more than one part of the code I suspect. E.g. I have to tweak even build.xml, when I'm logged in as "Épreuve". :( Minable.

I'll give this some more time, as I've meant to for a while.
msg11272 (view) Author: Jeff Allen (jeff.allen) Date: 2017-03-25.10:57:43
I've fixed the build, the problem being that ANTLR would generate files in file.encoding and then we would compile them as UTF-8. It makes no difference to the *text*, but the *comments* contain the full source path. C:\Users\Épreuve\atelier\ ... blahblah ... . Now file.encoding=UTF-8.

I'm fighting the launcher now, in the shape of jython.py. One can easily create a complicated situation in which all sorts of encodings are in play. Just at the DOS and Python prompts:

> type argtest.py
# What do arguments appear as, when codepages intervene?
import sys, os, locale, subprocess
print sys.argv
for arg in sys.argv:
    print "%s ( %r )" % (arg, arg)

> chcp
Active code page: 850

> set TEST=Épreuve

> python -i argtest.py café crème %TEST%
['argtest.py', 'caf\xe9', 'cr\xe8me', '\xc9preuve']
argtest.py ( 'argtest.py' )
cafÚ ( 'caf\xe9' )
crÞme ( 'cr\xe8me' )
╔preuve ( '\xc9preuve' )

### Notice that sys.argv contains byte strings but they are
### not encoded with the console encoding cp850.
### The os module is using the same encoding.

>>> os.getcwd()
'C:\\Users\\\xc9preuve\\Documents\\Python2'
>>> print os.getcwd()
C:\Users\╔preuve\Documents\Python2
>>> print os.getcwdu()
C:\Users\Épreuve\Documents\Python2
>>> os.getenv('TEST')
'\xc9preuve'

### There are plenty of encodings to choose from.

>>> sys.stdout.encoding
'cp850'
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'mbcs'
>>> locale.getpreferredencoding()
'cp1252'

### But this one is consistent with what I'm seeing:

>>> for a in sys.argv: print a.decode(locale.getpreferredencoding())
...
argtest.py
café
crème
Épreuve

What fun! I *tentatively* conclude we must treat arguments and environment variables as encoded with locale.getpreferredencoding(). This also seems to be the acceptable encoding when we come to launch a subprocess:
>>> subprocess.call(["python", "argtest.py"] + sys.argv[1:])
['argtest.py', 'caf\xe9', 'cr\xe8me', '\xc9preuve']
argtest.py ( 'argtest.py' )
cafÚ ( 'caf\xe9' )
crÞme ( 'cr\xe8me' )
╔preuve ( '\xc9preuve' )

The point here is not that these print correctly, but they print the same as they did when I ran this from the DOS prompt.

Now, in jython.py, it's all driven from sys.stdout.encoding, which is different. We may even be calling encode() where we should be decoding. Or possibly we could just leave everything as bytes in the seemingly-consistent encoding of CPython and Windows. I'll see what I can do. (I'll try not to break jython.py for Linux, though it seems the minority case here.)

Eventually, when Jython lunches again, I'll get to the bug(s) our users French and Chinese are experiencing, that pops up first in site.py.

But fighting jython.py has been instructive. There may be lessons from CPython here about what we should be doing internally to Jython when handling byte strings from the system via file system, environment and arguments.
msg11276 (view) Author: Jeff Allen (jeff.allen) Date: 2017-03-27.07:45:38
I've re-written jython.py to use Unicode internally, decoding args and environment variables in-bound, and encoding for subprocess.call() out-bound. Both times we use locale.getpreferredencoding(), which is cp1252 on my system while the console encoding is cp850. It passes test_jython_launcher for a user named "Épreuve" as long as I suppress the site module with -S.

Interestingly, both virtualenv and PyInstaller (on Python 2.7.13) fail for this user with: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc9 ... .
History
Date User Action Args
2017-03-27 07:45:39jeff.allensetkeywords: + test failure causes
messages: + msg11276
2017-03-25 10:57:44jeff.allensetmessages: + msg11272
2017-03-22 07:14:04jeff.allensetmessages: + msg11261
2017-03-16 03:37:47bstjeansetfiles: + OpenRefineProblem.txt
nosy: + bstjean
messages: + msg11237
2015-09-19 09:51:24jeff.allensetmessages: + msg10265
2015-09-13 16:42:31jeff.allensetassignee: jeff.allen
messages: + msg10258
nosy: + jeff.allen
2015-05-20 06:38:27zyasoftsetnosy: + zyasoft
messages: + msg10070
2015-05-20 02:21:29liuxy_hes86create