Issue2608

classification
Title: Encoding problem in os.uname with non-ascii host name
Type: crash Severity: normal
Components: Library Versions:
Milestone: Jython 2.7.2
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: jeff.allen
Priority: Keywords:

Created on 2017-07-18.07:47:03 by jeff.allen, last changed 2017-09-05.20:51:20 by zyasoft.

Messages
msg11478 (view) Author: Jeff Allen (jeff.allen) Date: 2017-07-18.07:47:02
Within https://github.com/jythontools/jython/issues/83 the user encounters an error that I tentatively identify as our failure to handle his host name correctly here: https://hg.python.org/jython/file/tip/src/org/python/modules/posix/PosixModule.java#l1174 . I think we should be FS-encoding the String(s).

I have not reproduced this myself yet. (Need to change the host name to include a character >255.)
msg11480 (view) Author: Jeff Allen (jeff.allen) Date: 2017-07-18.20:57:16
I have reproduced this with the host name: 先知_MICAH. In fact we have two places to fix, at least.

> dist\bin\jython
Jython 2.7.1 (default:0df7adb1b397, Jul 18 2017, 21:24:53)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, platform
>>> os.uname()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
        at org.python.core.PyString.<init>(PyString.java:57)
        at org.python.core.PyString.<init>(PyString.java:70)
        at org.python.core.PyString.<init>(PyString.java:74)
        at org.python.core.Py.newString(Py.java:647)
        at org.python.modules.posix.PosixModule.uname(PosixModule.java:1169)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
>>> platform.uname()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\platform.py", line 1212, in uname
    node = _node()
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\platform.py", line 990, in _node
    return socket.gethostname()
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\platform.py", line 990, in _node
    return socket.gethostname()
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\_socket.py", line 382, in handle_exception
    return method_or_function(*args, **kwargs)
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\_socket.py", line 382, in handle_exception
    return method_or_function(*args, **kwargs)
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\_socket.py", line 382, in handle_exception
    return method_or_function(*args, **kwargs)
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\_socket.py", line 1875, in gethostname
    return str(InetAddress.getLocalHost().getHostName())
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
>>> '\xcf\xc8\xd6\xaa_MICAH'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid data
>>> '\xcf\xc8\xd6\xaa_MICAH'.decode('cp936')
u'\u5148\u77e5_MICAH'

It is interesting that platform.uname chokes down in _socket.py. It makes me thing we should look suspiciously at wherever we str-ingify a Java String, to see if we should be FS-encoding that (or something else). I attempted this throughout our Java source, but not the Python.


For interest, the behaviour of CPython is:

------------------------------------------------ 2
> python
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, platform
>>> platform.uname()
('Windows', '\xcf\xc8\xd6\xaa_MICAH', '10', '10.0.14393', 'AMD64', 'AMD64 Family 16 Model 5 Stepping 2, AuthenticAMD')
>>> print platform.uname()[1].decode(sys.getfilesystemencoding())
先知_MICAH

So in Python 2, the host name should appear as bytes in file-system encoding.

------------------------------------------------ 3
> python
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.uname()
uname_result(system='Windows', node='先知_MICAH', release='10', version='10.0.14393', machine='AMD64', processor='AMD64 Family 16 Model 5 Stepping 2, AuthenticAMD')
>>> platform.uname().node
'先知_MICAH'
msg11483 (view) Author: Jeff Allen (jeff.allen) Date: 2017-07-20.07:49:21
I claim this is fixed at: https://hg.python.org/jython/rev/c3e2799ef812

>>> import os, platform
>>> os.uname()
('Windows', '\xe5\x85\x88\xe7\x9f\xa5_MICAH', '8.1', '10.0.14393', 'AMD64')
>>> platform.uname()
('Java', '\xe5\x85\x88\xe7\x9f\xa5_MICAH', '1.7.0_60', 'Java HotSpot(TM) 64-Bit Server VM, 24.60-b09, Oracle Corporation', 'AMD64', 'AMD64 Family 16 Model 5 Stepping 2, AuthenticAMD')
>>> print platform.uname()[1].decode(sys.getfilesystemencoding())
先知_MICAH
History
Date User Action Args
2017-09-05 20:51:20zyasoftsetstatus: pending -> closed
2017-07-20 07:49:22jeff.allensetstatus: open -> pending
messages: + msg11483
2017-07-18 20:57:17jeff.allensetmessages: + msg11480
2017-07-18 07:47:03jeff.allencreate