Issue1625
Created on 2010-06-30.11:26:57 by rdesgroppes, last changed 2013-02-20.18:31:02 by fwierzbicki.
msg5863 (view) |
Author: Régis Desgroppes (rdesgroppes) |
Date: 2010-06-30.11:26:55 |
|
Hi,
Coercion from/to Unicode should always consider default encoding, typically specified after a call to sys.setdefaultencoding(name) in sitecustomize.py.
Here with utf-8:
1. explicit str-unicode coercion does it well: unicode("utf-8 string") -> utf-8 decoder is used. yes!
2. explicit unicode-str coercion does it wrong: str(u"unicode string") -> utf-8 encoder is not used. latin-1 one?
3. implicit str-unicode coercion does it wrong: java,lang.String("utf-8 string") -> utf-8 decoder is not used.
4. implicit unicode-str coercion does it wrong: str(java.lang.Object) or str(java.lang.Object.toString()) -> utf-8 encoder is not used.
Quick workaround for ^3. and ^4., in the form of a custom hook in sitecustomize.py:
----
sys.setdefaultencoding("utf-8")
if os.name == "java":
from java.lang import Object
def utf8_str(obj, orig_str=str):
if isinstance(obj, unicode):
return obj.encode("utf-8")
if isinstance(obj, Object):
return obj.toString().encode("utf-8")
return orig_str(obj)
sys.builtins["str"] = utf8_str
----
Thanks for your attention,
Regis
|
msg5929 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2010-07-27.23:58:05 |
|
I think we have all this correct in 2.5.1 (or at least on trunk).
You're wrong about #2:
Jython 2.5.2b1 (trunk:7081M, Jul 27 2010, 16:21:32)
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_17
>>> import sys; sys.setdefaultencoding('latin1')
>>> str(u'日本')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)
As for #3, our str and unicode objects are backed by java.lang.String. In that respect it makes sense that java.lang.String('foo') or on u'foo' returns the underlying String object, sans any conversion
For #4, str(obj) on arbitrary objects in plain Python always returns the result of obj.__str__. Arbitrary Java objects in Jython by default have __str__ methods which return the result of their toString method. Hence that result
|
msg5931 (view) |
Author: Régis Desgroppes (rdesgroppes) |
Date: 2010-07-28.12:49:52 |
|
How impressive was your speed in closing this ticket! With a nice status: "invalid". Thank you very much: that was exactly what Jython users needed.
More seriously, do you consider Jython 2.5.1 obeys Python 2.5 convention about implicit coercion from/to Unicode?
Do you really think the fact you're using java.lang.String as internal storage for both str and unicode is an acceptable explanation?
|
msg5932 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2010-07-28.20:12:51 |
|
I don't mean to completely dismiss your bug report: I generally close them when I've personally deemed them invalid -- but users can always reopen if they disagree. That's for the sake of keeping the tracker organized. We get a lot of bug reports and where the reporters disappear when we're expecting more correspondence.
So we can agree #1 and #2 aren't problematic. Can we also agree that #3 doesn't really relate to any Python 2.5 convention? java.lang.String is a completely different beast in this respect. Maybe you can describe the use case you're running into with its current behavior.
When I think about #4 again it's somewhat related to the recently fixed #1563. What you're really suggesting is we make __str__ on Java types use encode(). With #1563 that's a reasonable request, though I'm concerned it could break something
|
msg7725 (view) |
Author: Frank Wierzbicki (fwierzbicki) |
Date: 2013-02-20.18:31:02 |
|
Hmmm no response for two years so I think we can close this.
|
|
Date |
User |
Action |
Args |
2013-02-20 18:31:02 | fwierzbicki | set | status: open -> closed resolution: out of date messages:
+ msg7725 nosy:
+ fwierzbicki versions:
+ Jython 2.5 |
2010-07-28 20:12:53 | pjenvey | set | status: closed -> open resolution: invalid -> (no value) messages:
+ msg5932 |
2010-07-28 12:49:53 | rdesgroppes | set | messages:
+ msg5931 |
2010-07-27 23:58:07 | pjenvey | set | status: open -> closed resolution: invalid messages:
+ msg5929 nosy:
+ pjenvey |
2010-06-30 11:26:57 | rdesgroppes | create | |
|