Message1697

Author pekka.klarck
Recipients
Date 2007-07-03.08:01:13
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
At least when decoding invalid utf-8 byte to unicode you get out an unicode object which has a weird 'uu' prefix. This is illustrated by the example below. 

Jython 2.2rc1 on java1.5.0_11
Type "copyright", "credits" or "license" for more information.
>>> u = '\xFF'.decode('utf-8', 'replace')
>>> u
uu'\uFFFD'
>>> type(u)
<type 'unicode'>
>>> print u
?

There's also some discussion about this at Jython users mailing list in the beginning of July 2007 as a sub thread of "character encoding issues" thread. Following link ought to point to my mail about this.

http://sourceforge.net/mailarchive/message.php?msg_name=f5f747f10707020428t479239cdsa139465fffdfc87%40mail.gmail.com
History
Date User Action Args
2008-02-20 17:17:52adminlinkissue1746957 messages
2008-02-20 17:17:52admincreate