Message9965

Author	gsnedders
Recipients	gsnedders, jeff.allen, zyasoft
Date	2015-04-25.23:36:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1430004986.38.0.942507280386.issue2340@psf.upfronthosting.co.za>
In-reply-to

Content
I have mixed feelings about this: it's nice to be able to do something like `eval(r'u"\uD800"')` to see if the Python being run on supports lone surrogates (and not having to hardcode a list of implementations that do/don't). Also seems a bit evil given in a fair few cases it'll suddenly fail much later on when dealing with the string (when toString is called, which isn't obvious from the Python level), and I'm not sure TypeError makes sense there, though I'm not sure what does. To make a counter-proposal: allow them to parse, but doing anything with them throw the error. This allows for something like: ``` try: eval(r'u"\uD800"[0]') except SyntaxError, TypeError: supportsSurrogates = False else: supportsSurrogates = True if supportsSurrogates: x = u"[\uD800-\uDFFF]" else: x = u"" ``` That said, that's quite a lot more work to do. Meh. Not sure what to suggest. Also, just to point out: in principle, "\UFFFFFFFF" should still fail. Nothing else actually uses 32-bit code units, despite what the documentation suggests. Speaking of PEP 939 and Python 3.3, note the definition of the Unicode type changed to be "a sequence of values that represent Unicode code points", which implicitly allows surrogates (both unpaired and paired), so the hypothetical Python 3 release of Jython will eventually have to deal with them properly somehow.

I have mixed feelings about this: it's nice to be able to do something like `eval(r'u"\uD800"')` to see if the Python being run on supports lone surrogates (and not having to hardcode a list of implementations that do/don't). Also seems a bit evil given in a fair few cases it'll suddenly fail much later on when dealing with the string (when toString is called, which isn't obvious from the Python level), and I'm not sure TypeError makes sense there, though I'm not sure what does.

To make a counter-proposal: allow them to parse, but doing *anything* with them throw the error. This allows for something like:

```
try:
    eval(r'u"\uD800"[0]')
except SyntaxError, TypeError:
    supportsSurrogates = False
else:
    supportsSurrogates = True

if supportsSurrogates:
    x = u"[\uD800-\uDFFF]"
else:
    x = u""
```

That said, that's quite a lot more work to do. Meh. Not sure what to suggest.

Also, just to point out: in principle, "\UFFFFFFFF" should still fail. Nothing else actually uses 32-bit code units, despite what the documentation suggests.

Speaking of PEP 939 and Python 3.3, note the definition of the Unicode type changed to be "a sequence of values that represent Unicode code points", which implicitly allows surrogates (both unpaired and paired), so the hypothetical Python 3 release of Jython will eventually have to deal with them properly somehow.

History
Date	User	Action	Args
2015-04-25 23:36:26	gsnedders	set	messageid: <1430004986.38.0.942507280386.issue2340@psf.upfronthosting.co.za>
2015-04-25 23:36:26	gsnedders	set	recipients: + gsnedders, zyasoft, jeff.allen
2015-04-25 23:36:26	gsnedders	link	issue2340 messages
2015-04-25 23:36:25	gsnedders	create