Title: Raw-Unicode-Escape encoding not fully supported
Type: Severity: normal
Components: Core Versions: 25rc4
Status: closed Resolution: fixed
Assigned To: pjenvey Nosy List: fwierzbicki, pjenvey, undefined, zyasoft
Created on 2009-05-22.03:15:02 by undefined, last changed 2009-05-30.02:08:42 by pjenvey.

File name Uploaded Description Edit Remove
raw_unicode_escape.diff pjenvey, 2009-05-29.06:05:39
msg4707 (view) Author: (undefined) Date: 2009-05-22.03:15:02
following the latest tutorial for python 2.5, specifically

>>> ur'Hello\u0020World !'
u'Hello\\u0020World !'

>>> ur'Hello\\u0020World !'
u'Hello\\\\u0020World !'

that first one should be: u'Hello World !' (as shown in the tutorial and
tested on python 2.5.2).
msg4715 (view) Author: Jim Baker (zyasoft) Date: 2009-05-23.21:00:21
This is a candidate for blocking the next RC, although it's likely to be
minor in actual usage.
msg4716 (view) Author: Jim Baker (zyasoft) Date: 2009-05-23.21:00:25
msg4756 (view) Author: Philip Jenvey (pjenvey) Date: 2009-05-29.06:05:30
this is an easy fix (see the diff) but it had me 
notice the raw unicode codec wasn't handling \U sequences

Attached patch for review fixes both problems, though 
EncodeRawUnicodeEscape isn't as efficient as it could be. This bug 
shouldn't really go out with 2.5 but it's also really late in the 
release process
msg4759 (view) Author: Jim Baker (zyasoft) Date: 2009-05-29.14:10:44
The patch looks fine to me, and not so inefficient (small constant
factors here). But if you want to squeeze a bit more w/o too much work:

Allocate the StringBuilder buffer a smaller multiple, 10x is way too
much. I think 1x should be fine, this will be optimal for unescaped
ascii strings and for the rest, StringBuilder should converge pretty
quickly to the right size anyway.

toCodePoints is just a wrapper of PyUnicode.newSubsequenceIterator, so
you can avoid one extra loop.
msg4762 (view) Author: Philip Jenvey (pjenvey) Date: 2009-05-30.02:08:42
applied in r6424 with further cleanup/fixes, mostly to the decoder
