Message6430

Author	amak
Recipients	amak, yyamano
Date	2011-03-12.13:56:26
SpamBayes Score	1.2533358e-10
Marked as misclassified	No
Message-id	<1299938187.07.0.445340464256.issue1707@psf.upfronthosting.co.za>
In-reply-to

Content
It's arguable that the cpython behaviour is wrong in this case. Why would you want to handle an unpaired surrogate? Note that java will not permit this. Consider the following code Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.5.0_21 Type "help", "copyright", "credits" or "license" for more information. >>> import java >>> import jarray >>> bytes = [-34, -18] >>> byte_array = jarray.array(bytes, 'b') >>> java_string = java.lang.String(byte_array, "UTF-16") >>> jython_string = unicode(java_string) >>> jython_string u'\ufffd' Note that the result is u"\ufffd", which is a "Replacement Character". "Replacement Character: A character used as a substitute for an uninterpretable character from another encoding. The Unicode Standard uses U+FFFD replacement character for this function." http://unicode.org/glossary/

It's arguable that the cpython behaviour is wrong in this case.

Why would you want to handle an unpaired surrogate?

Note that java will not permit this. Consider the following code

Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.5.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> import java
>>> import jarray
>>> bytes = [-34, -18]
>>> byte_array = jarray.array(bytes, 'b')
>>> java_string = java.lang.String(byte_array, "UTF-16")
>>> jython_string = unicode(java_string)
>>> jython_string
u'\ufffd'

Note that the result is u"\ufffd", which is a "Replacement Character".

"Replacement Character: A character used as a substitute for an uninterpretable character from another encoding. The Unicode Standard uses U+FFFD  replacement character for this function."

http://unicode.org/glossary/

History
Date	User	Action	Args
2011-03-12 13:56:27	amak	set	messageid: <1299938187.07.0.445340464256.issue1707@psf.upfronthosting.co.za>
2011-03-12 13:56:27	amak	set	recipients: + amak, yyamano
2011-03-12 13:56:26	amak	link	issue1707 messages
2011-03-12 13:56:26	amak	create