Message1697

Author	pekka.klarck
Recipients
Date	2007-07-03.08:01:13
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
At least when decoding invalid utf-8 byte to unicode you get out an unicode object which has a weird 'uu' prefix. This is illustrated by the example below. Jython 2.2rc1 on java1.5.0_11 Type "copyright", "credits" or "license" for more information. >>> u = '\xFF'.decode('utf-8', 'replace') >>> u uu'\uFFFD' >>> type(u) <type 'unicode'> >>> print u ? There's also some discussion about this at Jython users mailing list in the beginning of July 2007 as a sub thread of "character encoding issues" thread. Following link ought to point to my mail about this. http://sourceforge.net/mailarchive/message.php?msg_name=f5f747f10707020428t479239cdsa139465fffdfc87%40mail.gmail.com

At least when decoding invalid utf-8 byte to unicode you get out an unicode object which has a weird 'uu' prefix. This is illustrated by the example below. 

Jython 2.2rc1 on java1.5.0_11
Type "copyright", "credits" or "license" for more information.
>>> u = '\xFF'.decode('utf-8', 'replace')
>>> u
uu'\uFFFD'
>>> type(u)
<type 'unicode'>
>>> print u
?

There's also some discussion about this at Jython users mailing list in the beginning of July 2007 as a sub thread of "character encoding issues" thread. Following link ought to point to my mail about this.

http://sourceforge.net/mailarchive/message.php?msg_name=f5f747f10707020428t479239cdsa139465fffdfc87%40mail.gmail.com

History
Date	User	Action	Args
2008-02-20 17:17:52	admin	link	issue1746957 messages
2008-02-20 17:17:52	admin	create