Message6430

Author amak
Recipients amak, yyamano
Date 2011-03-12.13:56:26
SpamBayes Score 1.2533358e-10
Marked as misclassified No
Message-id <1299938187.07.0.445340464256.issue1707@psf.upfronthosting.co.za>
In-reply-to
Content
It's arguable that the cpython behaviour is wrong in this case.

Why would you want to handle an unpaired surrogate?

Note that java will not permit this. Consider the following code

Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.5.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> import java
>>> import jarray
>>> bytes = [-34, -18]
>>> byte_array = jarray.array(bytes, 'b')
>>> java_string = java.lang.String(byte_array, "UTF-16")
>>> jython_string = unicode(java_string)
>>> jython_string
u'\ufffd'

Note that the result is u"\ufffd", which is a "Replacement Character".

"Replacement Character: A character used as a substitute for an uninterpretable character from another encoding. The Unicode Standard uses U+FFFD  replacement character for this function."

http://unicode.org/glossary/
History
Date User Action Args
2011-03-12 13:56:27amaksetmessageid: <1299938187.07.0.445340464256.issue1707@psf.upfronthosting.co.za>
2011-03-12 13:56:27amaksetrecipients: + amak, yyamano
2011-03-12 13:56:26amaklinkissue1707 messages
2011-03-12 13:56:26amakcreate