Message4761
This is a fundamental design decision: we do not allow for isolated half
surrogates in Jython, since we use the same underlying representation as
Java, UTF-16, for our unicode strings. In Jython, unicode is just a
wrapper around java.lang.String.
Wikipedia succinctly describes the issue here: "All possible code points
from U+0000 through U+10FFFF, except for the surrogate code points
U+D800–U+DFFF (which are not characters), are uniquely mapped by UTF-16
regardless of the code point's current or future character assignment or
use." (http://en.wikipedia.org/wiki/UTF-16).
So the workaround is to special case for Jython \uD8000-\uDFFF, instead
of using a regex as in msg4625.
Similar considerations would apply for other Unicode usage in CPython,
notably UCS2 vs UCS4.
A similar problem was seen in Pygments, http://dev.pocoo.org/projects/pygments/ticket/358 |
|
Date |
User |
Action |
Args |
2009-05-30 00:45:58 | zyasoft | set | messageid: <1243644358.93.0.471744870654.issue1335@psf.upfronthosting.co.za> |
2009-05-30 00:45:58 | zyasoft | set | recipients:
+ zyasoft, dmbaggett |
2009-05-30 00:45:58 | zyasoft | link | issue1335 messages |
2009-05-30 00:45:57 | zyasoft | create | |
|