Message7988
Python 2 doesn't define the unicode type as UCS-2 or UTF-32 string: it defines it as a sequence of code units: "The items of a Unicode object are Unicode code units. A Unicode code unit is represented by a Unicode object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time)."
As such, validity constraints of UCS-2 and UTF-32 (and UTF-16) do not apply here, as it is none of them, but rather it is an abstract sequence of code units. It places no constraints on what Unicode ordinals (which I take to mean codepoints) are valid.
The Python 3, definition, for what it's worth, is clearer in terms of what is allowed: "A string is a sequence of values that represent Unicode codepoints. All the codepoints in range U+0000 - U+10FFFF can be represented in a string." This is clear that lone surrogates are valid. |
|
Date |
User |
Action |
Args |
2013-04-07 15:06:02 | gsnedders | set | messageid: <1365347162.68.0.0505202254274.issue1836@psf.upfronthosting.co.za> |
2013-04-07 15:06:02 | gsnedders | set | recipients:
+ gsnedders, fwierzbicki, amak, jeff.allen |
2013-04-07 15:06:02 | gsnedders | link | issue1836 messages |
2013-04-07 15:06:02 | gsnedders | create | |
|