Message4761

Author	zyasoft
Recipients	dmbaggett, zyasoft
Date	2009-05-30.00:45:57
SpamBayes Score	1.7947809e-08
Marked as misclassified	No
Message-id	<1243644358.93.0.471744870654.issue1335@psf.upfronthosting.co.za>
In-reply-to

Content
This is a fundamental design decision: we do not allow for isolated half surrogates in Jython, since we use the same underlying representation as Java, UTF-16, for our unicode strings. In Jython, unicode is just a wrapper around java.lang.String. Wikipedia succinctly describes the issue here: "All possible code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800–U+DFFF (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use." (http://en.wikipedia.org/wiki/UTF-16). So the workaround is to special case for Jython \uD8000-\uDFFF, instead of using a regex as in msg4625. Similar considerations would apply for other Unicode usage in CPython, notably UCS2 vs UCS4. A similar problem was seen in Pygments, http://dev.pocoo.org/projects/pygments/ticket/358

This is a fundamental design decision: we do not allow for isolated half 
surrogates in Jython, since we use the same underlying representation as 
Java, UTF-16, for our unicode strings. In Jython, unicode is just a 
wrapper around java.lang.String.

Wikipedia succinctly describes the issue here: "All possible code points 
from U+0000 through U+10FFFF, except for the surrogate code points 
U+D800–U+DFFF (which are not characters), are uniquely mapped by UTF-16 
regardless of the code point's current or future character assignment or 
use." (http://en.wikipedia.org/wiki/UTF-16).

So the workaround is to special case for Jython \uD8000-\uDFFF, instead 
of using a regex as in msg4625.

Similar considerations would apply for other Unicode usage in CPython, 
notably UCS2 vs UCS4.

A similar problem was seen in Pygments, http://dev.pocoo.org/projects/pygments/ticket/358

History
Date	User	Action	Args
2009-05-30 00:45:58	zyasoft	set	messageid: <1243644358.93.0.471744870654.issue1335@psf.upfronthosting.co.za>
2009-05-30 00:45:58	zyasoft	set	recipients: + zyasoft, dmbaggett
2009-05-30 00:45:58	zyasoft	link	issue1335 messages
2009-05-30 00:45:57	zyasoft	create