Issue1355

classification
Title: Raw-Unicode-Escape encoding not fully supported
Type: Severity: normal
Components: Core Versions: 25rc4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pjenvey Nosy List: fwierzbicki, pjenvey, undefined, zyasoft
Priority: urgent Keywords: patch

Created on 2009-05-22.03:15:02 by undefined, last changed 2009-05-30.02:08:42 by pjenvey.

Files
File name Uploaded Description Edit Remove
raw_unicode_escape.diff pjenvey, 2009-05-29.06:05:39
Messages
msg4707 (view) Author: (undefined) Date: 2009-05-22.03:15:02
following the latest tutorial for python 2.5, specifically
http://www.python.org/doc/2.5.4/tut/node5.html...

incorrect:
>>> ur'Hello\u0020World !'
u'Hello\\u0020World !'

correct:
>>> ur'Hello\\u0020World !'
u'Hello\\\\u0020World !'

that first one should be: u'Hello World !' (as shown in the tutorial and
tested on python 2.5.2).
msg4715 (view) Author: Jim Baker (zyasoft) Date: 2009-05-23.21:00:21
This is a candidate for blocking the next RC, although it's likely to be
minor in actual usage.
msg4716 (view) Author: Jim Baker (zyasoft) Date: 2009-05-23.21:00:25
This is a candidate for blocking the next RC, although it's likely to be minor in actual usage.
msg4756 (view) Author: Philip Jenvey (pjenvey) Date: 2009-05-29.06:05:30
this is an easy fix (see the GrammarActions.java diff) but it had me 
notice the raw unicode codec wasn't handling \U sequences

Attached patch for review fixes both problems, though 
EncodeRawUnicodeEscape isn't as efficient as it could be. This bug 
shouldn't really go out with 2.5 but it's also really late in the 
release process
msg4759 (view) Author: Jim Baker (zyasoft) Date: 2009-05-29.14:10:44
The patch looks fine to me, and not so inefficient (small constant
factors here). But if you want to squeeze a bit more w/o too much work:

Allocate the StringBuilder buffer a smaller multiple, 10x is way too
much. I think 1x should be fine, this will be optimal for unescaped
ascii strings and for the rest, StringBuilder should converge pretty
quickly to the right size anyway.

toCodePoints is just a wrapper of PyUnicode.newSubsequenceIterator, so
you can avoid one extra loop.
msg4762 (view) Author: Philip Jenvey (pjenvey) Date: 2009-05-30.02:08:42
applied in r6424 with further cleanup/fixes, mostly to the decoder
History
Date User Action Args
2009-05-30 02:08:42pjenveysetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg4762
2009-05-29 14:10:45zyasoftsetmessages: + msg4759
2009-05-29 06:06:13pjenveysetpriority: high -> urgent
files: + raw_unicode_escape.diff
messages: + msg4756
keywords: + patch
2009-05-29 01:42:54pjenveysetassignee: pjenvey
nosy: + pjenvey
2009-05-23 21:00:25zyasoftsetmessages: + msg4716
2009-05-23 21:00:21zyasoftsetpriority: high
nosy: + fwierzbicki, zyasoft
resolution: accepted
messages: + msg4715
2009-05-22 03:15:02undefinedcreate