Message7988

Author gsnedders
Recipients amak, fwierzbicki, gsnedders, jeff.allen
Date 2013-04-07.15:06:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1365347162.68.0.0505202254274.issue1836@psf.upfronthosting.co.za>
In-reply-to
Content
Python 2 doesn't define the unicode type as UCS-2 or UTF-32 string: it defines it as a sequence of code units: "The items of a Unicode object are Unicode code units. A Unicode code unit is represented by a Unicode object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time)."

As such, validity constraints of UCS-2 and UTF-32  (and UTF-16) do not apply here, as it is none of them, but rather it is an abstract sequence of code units. It places no constraints on what Unicode ordinals (which I take to mean codepoints) are valid.

The Python 3, definition, for what it's worth, is clearer in terms of what is allowed: "A string is a sequence of values that represent Unicode codepoints. All the codepoints in range U+0000 - U+10FFFF can be represented in a string." This is clear that lone surrogates are valid.
History
Date User Action Args
2013-04-07 15:06:02gsnedderssetmessageid: <1365347162.68.0.0505202254274.issue1836@psf.upfronthosting.co.za>
2013-04-07 15:06:02gsnedderssetrecipients: + gsnedders, fwierzbicki, amak, jeff.allen
2013-04-07 15:06:02gsnedderslinkissue1836 messages
2013-04-07 15:06:02gsnedderscreate