Message5863
Hi,
Coercion from/to Unicode should always consider default encoding, typically specified after a call to sys.setdefaultencoding(name) in sitecustomize.py.
Here with utf-8:
1. explicit str-unicode coercion does it well: unicode("utf-8 string") -> utf-8 decoder is used. yes!
2. explicit unicode-str coercion does it wrong: str(u"unicode string") -> utf-8 encoder is not used. latin-1 one?
3. implicit str-unicode coercion does it wrong: java,lang.String("utf-8 string") -> utf-8 decoder is not used.
4. implicit unicode-str coercion does it wrong: str(java.lang.Object) or str(java.lang.Object.toString()) -> utf-8 encoder is not used.
Quick workaround for ^3. and ^4., in the form of a custom hook in sitecustomize.py:
----
sys.setdefaultencoding("utf-8")
if os.name == "java":
from java.lang import Object
def utf8_str(obj, orig_str=str):
if isinstance(obj, unicode):
return obj.encode("utf-8")
if isinstance(obj, Object):
return obj.toString().encode("utf-8")
return orig_str(obj)
sys.builtins["str"] = utf8_str
----
Thanks for your attention,
Regis |
|
Date |
User |
Action |
Args |
2010-06-30 11:26:57 | rdesgroppes | set | recipients:
+ rdesgroppes |
2010-06-30 11:26:57 | rdesgroppes | set | messageid: <1277897217.13.0.83259667503.issue1625@psf.upfronthosting.co.za> |
2010-06-30 11:26:56 | rdesgroppes | link | issue1625 messages |
2010-06-30 11:26:55 | rdesgroppes | create | |
|