Issue2649

classification
Title: __repr__ of Java Map having unicode keys crashes with PyString error
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.7
Milestone: Jython 2.7.2
process
Status: pending Resolution: fixed
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: jeff.allen, psykiatris
Priority: normal Keywords:

Created on 2017-12-04.07:20:45 by psykiatris, last changed 2019-03-28.09:23:02 by jeff.allen.

Files
File name Uploaded Description Edit Remove
showname2.py psykiatris, 2017-12-04.07:20:44 Soource code
displaymonths.py psykiatris, 2017-12-05.17:40:46 Searches locales and prints the months in that language.
unnamed psykiatris, 2018-02-28.02:02:14
Messages
msg11686 (view) Author: Patrick Palczewski (psykiatris) Date: 2017-12-04.07:20:43
Playing with Jython 2.7.2a1+

I don't know if this is a Jython issue or Java issue. 

Basically, my script creats a GregorianCalendar object, and attempts to show the names of each month in whatever language. It wroks as expected. But when I pass a graphical language (like Korean) or extended characters (like Vietnamese) it fails with the following error:

===================================
./jython showname2.py
ko 12  <<<<<<< starts off fine...
Traceback (most recent call last):
  File "showname2.py", line 13, in <module>
    print loc, x, gc.getDisplayNames(2,2,loc)
java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
====================

If I change the method to getDisplayName(2,2,loc), It works correctly, showing the graphics for the current month

I figure the code that works for getDisplayName should also be in getDisplayNames. I can probably review it and correct it, but I'm familiar with commiitting and pushing my own code. But I get nervous mucking with someone else's.
msg11687 (view) Author: Patrick Palczewski (psykiatris) Date: 2017-12-04.16:30:32
I tried doing a workaround for this issue, because I wanted to be able to display all the names of the months to build a calendar grid. I noticed that in ASCII, getDisplayNames() returned a dictionary with the months as keys:
{June: 5, October: 9, December: 11, May: 4, September: 8, March: 2, July: 6, January: 0, February: 1, April: 3, August: 7, November: 10}

Therefore, the following attempt:

=======================================
>>> months = {}
>>> months.update(gc.getDisplayNames(2,2,loc))
>>> months
{u'\u13a0\u13c5\u13f1': 2, u'\u13ab\u13f0\u13c9\u13c2': 6, u'\u13a0\u13c2\u13cd\u13ac\u13d8': 4, u'\u13d5\u13ad\u13b7\u13f1': 5, u'\u13a4\u13c3\u13b8\u13d4\u13c5': 0, u'\u13da\u13c2\u13c5\u13d7': 9, u'\u13a5\u13cd\u13a9\u13f1': 11, u'\u13a7\u13ec\u13c2': 3, u'\u13c5\u13d3\u13d5\u13c6': 10, u'\u13da\u13b5\u13cd\u13d7': 8, u'\u13a7\u13a6\u13b5': 1, u'\u13a6\u13b6\u13c2': 7}
>>> print months
{u'\u13a0\u13c5\u13f1': 2, u'\u13ab\u13f0\u13c9\u13c2': 6, u'\u13a0\u13c2\u13cd\u13ac\u13d8': 4, u'\u13d5\u13ad\u13b7\u13f1': 5, u'\u13a4\u13c3\u13b8\u13d4\u13c5': 0, u'\u13da\u13c2\u13c5\u13d7': 9, u'\u13a5\u13cd\u13a9\u13f1': 11, u'\u13a7\u13ec\u13c2': 3, u'\u13c5\u13d3\u13d5\u13c6': 10, u'\u13da\u13b5\u13cd\u13d7': 8, u'\u13a7\u13a6\u13b5': 1, u'\u13a6\u13b6\u13c2': 7}
>>> print unicode(months)
{u'\u13a0\u13c5\u13f1': 2, u'\u13ab\u13f0\u13c9\u13c2': 6, u'\u13a0\u13c2\u13cd\u13ac\u13d8': 4, u'\u13d5\u13ad\u13b7\u13f1': 5, u'\u13a4\u13c3\u13b8\u13d4\u13c5': 0, u'\u13da\u13c2\u13c5\u13d7': 9, u'\u13a5\u13cd\u13a9\u13f1': 11, u'\u13a7\u13ec\u13c2': 3, u'\u13c5\u13d3\u13d5\u13c6': 10, u'\u13da\u13b5\u13cd\u13d7': 8, u'\u13a7\u13a6\u13b5': 1, u'\u13a6\u13b6\u13c2': 7}
==============================

Using getDisplayName() returns the proper unicode and graphic, but as you can see above, it won't do it for the multiple momnths. I guess I need to do a loop to print the month name one at a time.

=============================
>>> gc.getDisplayName(2,2,loc)
u'\u13a5\u13cd\u13a9\u13f1'
>>> print gc.getDisplayName(2,2,loc)
ᎥᏍᎩᏱ
msg11688 (view) Author: Patrick Palczewski (psykiatris) Date: 2017-12-05.17:40:45
I suppose this issue may be closed. While I still think it should display the languages correctly, the best practice would be to put it into a list and then call it via a loop.

I created a snippet (uploaded) that will search the locales and print the list of months. If anyone can suggest a better way to to this, by all means, let me know. I love learning and I appreciate it.

If you find my snippet useful, awesome!

Thanks!
msg11724 (view) Author: Jeff Allen (jeff.allen) Date: 2018-02-27.23:54:14
Thanks for bringing this up.

This isn't really to do with calendars. gc.getDisplayName(2,2,loc) returns a java.util.HashMap and our __repr__ for that is trying to force the Unicode it contains into byte-strings. Simpler demonstration:

>>> from java.util import GregorianCalendar
>>> from java.util import Locale
>>> gc = GregorianCalendar()
>>> names = gc.getDisplayNames(2, 2, Locale.KOREAN)
>>> dict(names)
{u'4\uc6d4': 3, u'3\uc6d4': 2, u'2\uc6d4': 1, u'1\uc6d4': 0, u'12\uc6d4': 11, u'10\uc6d4': 9, u'11\uc6d4': 10, u'9\uc6d4': 8, u'8\uc6d4': 7, u'7\uc6d4': 6, u'6\uc6d4': 5, u'5\uc6d4': 4}

But look at names with print or in the REPL and it blows up.

>>> names
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
        at org.python.core.PyString.<init>(PyString.java:57)
        at org.python.core.PyString.<init>(PyString.java:70)
        at org.python.core.PyString.<init>(PyString.java:74)
        at org.python.core.JavaProxyMap$2.__call__(JavaProxyMap.java:115)
        at org.python.core.PyObjectDerived.__repr__(PyObjectDerived.java:104)
...
msg11725 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-02-28.02:02:15
hi Jeff,

I haven't learned enough about the Java Hashmap yet. I ended up coding a 
workaround that worked. but, really, it was for my own use, to ensure 
the months displayed properly in other languages.

I appreciate the reply.

Regards,

On 02/27/2018 03:54 PM, Jeff Allen wrote:
> Jeff Allen <ja.py@farowl.co.uk> added the comment:
>
> Thanks for bringing this up.
>
> This isn't really to do with calendars. gc.getDisplayName(2,2,loc) returns a java.util.HashMap and our __repr__ for that is trying to force the Unicode it contains into byte-strings. Simpler demonstration:
>
>>>> from java.util import GregorianCalendar
>>>> from java.util import Locale
>>>> gc = GregorianCalendar()
>>>> names = gc.getDisplayNames(2, 2, Locale.KOREAN)
>>>> dict(names)
> {u'4\uc6d4': 3, u'3\uc6d4': 2, u'2\uc6d4': 1, u'1\uc6d4': 0, u'12\uc6d4': 11, u'10\uc6d4': 9, u'11\uc6d4': 10, u'9\uc6d4': 8, u'8\uc6d4': 7, u'7\uc6d4': 6, u'6\uc6d4': 5, u'5\uc6d4': 4}
>
> But look at names with print or in the REPL and it blows up.
>
>>>> names
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
>          at org.python.core.PyString.<init>(PyString.java:57)
>          at org.python.core.PyString.<init>(PyString.java:70)
>          at org.python.core.PyString.<init>(PyString.java:74)
>          at org.python.core.JavaProxyMap$2.__call__(JavaProxyMap.java:115)
>          at org.python.core.PyObjectDerived.__repr__(PyObjectDerived.java:104)
> ...
>
> ----------
> nosy: +jeff.allen
> priority:  -> normal
> resolution:  -> accepted
> severity: minor -> normal
> title: GregorianCalendar.getDisplayNames() crashes with PyString error -> __repr__ of Java Map having unicode keys crashes with PyString error
>
> _______________________________________
> Jython tracker <report@bugs.jython.org>
> <http://bugs.jython.org/issue2649>
> _______________________________________
msg12408 (view) Author: Jeff Allen (jeff.allen) Date: 2019-03-28.09:23:02
I claim to have fixed this (for now) at: https://hg.python.org/jython/rev/d24e20f4fc1e

Working on this I'm reminded that the way we manage __str__ and __repr__ (and __unicode__ and bytes/String confusion generally) is still quite fragile in Jython. Although Patrick's solutions in this area of Jython didn't all seem quite right, drawing attention to the divergence and its root cause is a significant contribution (thanks).

In CPython there is a PyObject_Repr (https://github.com/python/cpython/blob/2.7/Objects/object.c#L364) that deals with __repr__ implementations and always returns a str. I think this is what we needed here (and other places) and as such I'm not wholly content with my fix.

It is interesting that object.c does *not* contain the implementation of Python object. That is tucked inside typeobject.c (https://github.com/python/cpython/blob/2.7/Objects/typeobject.c#L3681). Instead, object.c contains (in Java terms) a sort of abstract base class.

This is worth considering as a design idea, but a less radical approach is to define Py.repr() that employs the same logic, and use it wherever we definitely want the bytes representation of an object, whatever an actual __repr__ throws at us. I think this may depend on further straightening out the __str__/__repr__ thing, and I don't really want to delay 2.7.2 for it.
History
Date User Action Args
2019-03-28 09:23:02jeff.allensetstatus: open -> pending
assignee: jeff.allen
resolution: accepted -> fixed
messages: + msg12408
2018-02-28 02:02:15psykiatrissetfiles: + unnamed
messages: + msg11725
2018-02-27 23:54:15jeff.allensetseverity: minor -> normal
title: GregorianCalendar.getDisplayNames() crashes with PyString error -> __repr__ of Java Map having unicode keys crashes with PyString error
nosy: + jeff.allen
messages: + msg11724
priority: normal
resolution: accepted
2017-12-05 17:40:47psykiatrissetfiles: + displaymonths.py
messages: + msg11688
severity: normal -> minor
2017-12-04 16:30:32psykiatrissetmessages: + msg11687
2017-12-04 07:20:45psykiatriscreate