Issue1355

classification

Title:	Raw-Unicode-Escape encoding not fully supported
Type:		Severity:	normal
Components:	Core	Versions:	25rc4
		Milestone:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	pjenvey	Nosy List:	fwierzbicki, pjenvey, undefined, zyasoft
Priority:	urgent	Keywords:	patch

Created on 2009-05-22.03:15:02 by undefined, last changed 2009-05-30.02:08:42 by pjenvey.

Files
File name	Uploaded	Description	Edit	Remove
raw_unicode_escape.diff	pjenvey, 2009-05-29.06:05:39

Messages
msg4707 (view)	Author: (undefined)	Date: 2009-05-22.03:15:02
following the latest tutorial for python 2.5, specifically http://www.python.org/doc/2.5.4/tut/node5.html... incorrect: >>> ur'Hello\u0020World !' u'Hello\\u0020World !' correct: >>> ur'Hello\\u0020World !' u'Hello\\\\u0020World !' that first one should be: u'Hello World !' (as shown in the tutorial and tested on python 2.5.2).
msg4715 (view)	Author: Jim Baker (zyasoft)	Date: 2009-05-23.21:00:21
This is a candidate for blocking the next RC, although it's likely to be minor in actual usage.
msg4716 (view)	Author: Jim Baker (zyasoft)	Date: 2009-05-23.21:00:25
This is a candidate for blocking the next RC, although it's likely to be minor in actual usage.
msg4756 (view)	Author: Philip Jenvey (pjenvey)	Date: 2009-05-29.06:05:30
this is an easy fix (see the GrammarActions.java diff) but it had me notice the raw unicode codec wasn't handling \U sequences Attached patch for review fixes both problems, though EncodeRawUnicodeEscape isn't as efficient as it could be. This bug shouldn't really go out with 2.5 but it's also really late in the release process
msg4759 (view)	Author: Jim Baker (zyasoft)	Date: 2009-05-29.14:10:44
The patch looks fine to me, and not so inefficient (small constant factors here). But if you want to squeeze a bit more w/o too much work: Allocate the StringBuilder buffer a smaller multiple, 10x is way too much. I think 1x should be fine, this will be optimal for unescaped ascii strings and for the rest, StringBuilder should converge pretty quickly to the right size anyway. toCodePoints is just a wrapper of PyUnicode.newSubsequenceIterator, so you can avoid one extra loop.
msg4762 (view)	Author: Philip Jenvey (pjenvey)	Date: 2009-05-30.02:08:42
applied in r6424 with further cleanup/fixes, mostly to the decoder

History
Date	User	Action	Args
2009-05-30 02:08:42	pjenvey	set	status: open -> closed resolution: accepted -> fixed messages: + msg4762
2009-05-29 14:10:45	zyasoft	set	messages: + msg4759
2009-05-29 06:06:13	pjenvey	set	priority: high -> urgent files: + raw_unicode_escape.diff messages: + msg4756 keywords: + patch
2009-05-29 01:42:54	pjenvey	set	assignee: pjenvey nosy: + pjenvey
2009-05-23 21:00:25	zyasoft	set	messages: + msg4716
2009-05-23 21:00:21	zyasoft	set	priority: high nosy: + fwierzbicki, zyasoft resolution: accepted messages: + msg4715
2009-05-22 03:15:02	undefined	create