Message6430
It's arguable that the cpython behaviour is wrong in this case.
Why would you want to handle an unpaired surrogate?
Note that java will not permit this. Consider the following code
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.5.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> import java
>>> import jarray
>>> bytes = [-34, -18]
>>> byte_array = jarray.array(bytes, 'b')
>>> java_string = java.lang.String(byte_array, "UTF-16")
>>> jython_string = unicode(java_string)
>>> jython_string
u'\ufffd'
Note that the result is u"\ufffd", which is a "Replacement Character".
"Replacement Character: A character used as a substitute for an uninterpretable character from another encoding. The Unicode Standard uses U+FFFD replacement character for this function."
http://unicode.org/glossary/ |
|
Date |
User |
Action |
Args |
2011-03-12 13:56:27 | amak | set | messageid: <1299938187.07.0.445340464256.issue1707@psf.upfronthosting.co.za> |
2011-03-12 13:56:27 | amak | set | recipients:
+ amak, yyamano |
2011-03-12 13:56:26 | amak | link | issue1707 messages |
2011-03-12 13:56:26 | amak | create | |
|