Issue1625

classification
Title: implicit coercion from/to Unicode
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.5, 2.5.1
Milestone:
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: fwierzbicki, pjenvey, rdesgroppes
Priority: Keywords:

Created on 2010-06-30.11:26:57 by rdesgroppes, last changed 2013-02-20.18:31:02 by fwierzbicki.

Messages
msg5863 (view) Author: Régis Desgroppes (rdesgroppes) Date: 2010-06-30.11:26:55
Hi,
Coercion from/to Unicode should always consider default encoding, typically specified after a call to sys.setdefaultencoding(name) in sitecustomize.py.

Here with utf-8:
1. explicit str-unicode coercion does it well: unicode("utf-8 string") -> utf-8 decoder is used. yes!
2. explicit unicode-str coercion does it wrong: str(u"unicode string") -> utf-8 encoder is not used. latin-1 one?
3. implicit str-unicode coercion does it wrong: java,lang.String("utf-8 string") -> utf-8 decoder is not used.
4. implicit unicode-str coercion does it wrong: str(java.lang.Object) or str(java.lang.Object.toString()) -> utf-8 encoder is not used.

Quick workaround for ^3. and ^4., in the form of a custom hook in sitecustomize.py:
----
    sys.setdefaultencoding("utf-8")

    if os.name == "java":
        from java.lang import Object
        def utf8_str(obj, orig_str=str):
            if isinstance(obj, unicode):
                return obj.encode("utf-8")
            if isinstance(obj, Object):
                return obj.toString().encode("utf-8")
            return orig_str(obj)
        sys.builtins["str"] = utf8_str
----

Thanks for your attention,
Regis
msg5929 (view) Author: Philip Jenvey (pjenvey) Date: 2010-07-27.23:58:05
I think we have all this correct in 2.5.1 (or at least on trunk).

You're wrong about #2:
Jython 2.5.2b1 (trunk:7081M, Jul 27 2010, 16:21:32) 
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_17
>>> import sys; sys.setdefaultencoding('latin1')
>>> str(u'日本')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

As for #3, our str and unicode objects are backed by java.lang.String. In that respect it makes sense that java.lang.String('foo') or on u'foo' returns the underlying String object, sans any conversion

For #4, str(obj) on arbitrary objects in plain Python always returns the result of obj.__str__. Arbitrary Java objects in Jython by default have __str__ methods which return the result of their toString method. Hence that result
msg5931 (view) Author: Régis Desgroppes (rdesgroppes) Date: 2010-07-28.12:49:52
How impressive was your speed in closing this ticket! With a nice status: "invalid". Thank you very much: that was exactly what Jython users needed.
More seriously, do you consider Jython 2.5.1 obeys Python 2.5 convention about implicit coercion from/to Unicode?
Do you really think the fact you're using java.lang.String as internal storage for both str and unicode is an acceptable explanation?
msg5932 (view) Author: Philip Jenvey (pjenvey) Date: 2010-07-28.20:12:51
I don't mean to completely dismiss your bug report: I generally close them when I've personally deemed them invalid -- but users can always reopen if they disagree. That's for the sake of keeping the tracker organized. We get a lot of bug reports and where the reporters disappear when we're expecting more correspondence.

So we can agree #1 and #2 aren't problematic. Can we also agree that #3 doesn't really relate to any Python 2.5 convention? java.lang.String is a completely different beast in this respect. Maybe you can describe the use case you're running into with its current behavior.

When I think about #4 again it's somewhat related to the recently fixed #1563. What you're really suggesting is we make __str__ on Java types use encode(). With #1563 that's a reasonable request, though I'm concerned it could break something
msg7725 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-02-20.18:31:02
Hmmm no response for two years so I think we can close this.
History
Date User Action Args
2013-02-20 18:31:02fwierzbickisetstatus: open -> closed
resolution: out of date
messages: + msg7725
nosy: + fwierzbicki
versions: + Jython 2.5
2010-07-28 20:12:53pjenveysetstatus: closed -> open
resolution: invalid -> (no value)
messages: + msg5932
2010-07-28 12:49:53rdesgroppessetmessages: + msg5931
2010-07-27 23:58:07pjenveysetstatus: open -> closed
resolution: invalid
messages: + msg5929
nosy: + pjenvey
2010-06-30 11:26:57rdesgroppescreate